Chapter 3

Chapter 3
Bioinformatics intervention in functional

genomics: current status and future
perspective—an overview
Swati Sharma1, Ashwani Kumar1, Dinesh Yadav2 and Manoj Kumar Yadav1
1
Department of Agricultural Biotechnology, College of Agriculture, Sardar Vallabh Bhai Patel University of Agriculture and Technology, Meerut,
Uttar Pradesh, India, 2Department of Biotechnology, D.D.U. Gorakhpur University, Gorakhpur, Uttar Pradesh, India
3.1 Introduction
The information resulted from postgenomic and high-throughput techniques are no longer a bottleneck in understanding
and tackling the biological processes. The biological problems are easy to unravel by sequencing of DNA, proteins
using various computational tools, and informatics algorithms for assessing molecular data (Khan, 2018).
Bioinformatics is playing a major role in the field of molecular biology ranging from cancer studies in humans to study
of microbial pathogens (Katara, 2014). Moreover, to understand the high-throughput techniques such as DNA microar-
rays, chip-on-chip, protein chips, and recently, the new-generation sequencers, from global prospective, the researchers
are handling a vast amount of data generated through these techniques. This huge amount of data generated needs to be
analyzed using bioinformatics tools. The first genomic initiative has been set up about 35 years ago, the Human
Genome Project, and completed in 2003. Bioinformatics aids in deciphering various human genes and provided infor-
mation about their structure and organization. A researcher could be able to learn more and more regarding functions of
genes and proteins among the similar and dissimilar organisms. The only challenging goal was determining the unit by
unit order of nucleotides together making up the human genome (Collins & Fink, 1995). Arabidopsis thaliana was the
first among the plants and third among the multicellular organism after Caenorhabditis elegans and Drosophila melano-
gaster, to be completely sequenced (Tabata et al., 2000). It became the sound basis for further investigations as on com-
pleting the sequencing of this plant; it was found that high-throughput technologies will dramatically increase the
knowledge on complex biological networks (Hidalgo, 2003). Bioinformatics is an interdisciplinary subject which is the
amalgamation of biological and information science that develops new methods and software tools to understand the
biological data. It plays a key role to do comprehensive analysis and to understand gene functions with variable levels
of protein expression. It is also used to compare the genetic and genomic data and aids to understand various evolution-
ary aspects of molecular biology. There are various sequence search engines, namely, for homology-based search,
NCBI BLAST N and BLAST p.; for orthologous sequence search, Ortho MCL; and for paralogous sequence search,
Mc Scan and Mc Scan X are available. Biological databases are used to store and distribute the sequence data, namely,
European Molecular Biology Laboratory (EMBL) and the DNA database of Japan (DDBJ). In order to speed up the
analysis, bioinformatics enriched itself with a lot of resources, facilities, and databases which are updated timely with
new information and knowledge. This review enlightens various bioinformatics methods to solve the biological pro-
blems which are related to functional genomics.
3.2 Functional genomic approaches

Functional genomics may be referred to as the development and application of global (genome-wide or system-wide)
experimental and systematic approaches that help to assess the gene function by use of information provided by struc-
tural genomics (Bouchez & Höfte, 1998). It deals with the study of genes and intergenic regions of the genome which
Bioinformatics in Agriculture. DOI: https://doi.org/10.1016/B978-0-323-89778-5.00028-3
© 2022 Elsevier Inc. All rights reserved. 37
38 SECTION | I Bioinformatics and next generation sequencing technologies
contributes to the different biological processes. The main goal of functional genomics is to generate a particular pheno-
type with the help of different components of a biological system. Some functional genomic approaches are mainly
based on DNA level (genomics and epigenomics), RNA level (transcriptomics), protein level (proteomics), and metabo-
lite level (metabolomics).
3.3 Serial analysis of gene expression

Serial analysis of gene expression (SAGE) is a unique method and used for identification of transcripts and quantifica-
tion of eukaryotic genome. The basic principle for this is the determination of a normal gene structure and identification
of structural changes in an abnormal genome (Wang, 2004). It is mainly based on representing the mRNAs by using a
short sequence tags followed by the concatenation of tags for cloning to allow the sequencing analysis. This technique
does not require prior knowledge of gene of interest. Velculescu, Zhang, Vogelstein, and Kinzler (1995) developed a
high-throughput method of determining the absolute effluence of every transcript in population of cells (Fig. 3.1).
mRNA obtained from cells allows to convert in double-stranded DNA form. Digestion was performed with a 4-bp cut-
ter “anchoring enzyme” NlaIII and then the poly-A proximal ends collected and ligated to a linker fragment. The men-
tioned linker fragment harbors a 50 -GGGAC-3 sequence, which is the site of recognition of the Type IIS restriction
endonuclease BsmFI. It cleaves the cDNA 15 bp away in the 30 direction from the recognition site. A 15-bp long
FIGURE 3.1 A SAGE procedure.

The AE used is NlaIII and TE used
in the procedure is BsmFI. Boxes A
and B are the independent linkers,
39 portions of which are designed
to contain TE sequence. Transcript-
derived tag sequences are denoted
by Ns. Blunt end ligation step is
denoted as *, and discussed later in
the text. AE, anchoring enzyme;
SAGE, Serial analysis of gene
expression; TE, tagging enzyme
Adapted from Yamamoto, M.,
Wakatsuki, T., Hada, A., & Ryo, A.
(2001). Use of serial analysis of
gene expression (SAGE) technol-
ogy. Journal of Immunological
Methods, 250(12), 4566.
Bioinformatics intervention in functional genomics: current status and future perspective—an overview Chapter | 3 39
fragment called tag released by treatment of the linker-ligated cDNA with BsmFI from a defined position of each
cDNA. The tags are concatenated and cloned into a plasmid vector, which is then sequenced after removal of this linker
fragment. Generally, for a given sample, around 10,000100,000 tags may be analyzed. The profusion of the transcript
which corresponds to the tag is represented by the number of each tag in the total sample. The next main step is to iden-
tify the gene which corresponds to the tag or tag annotation. The 15-bp tag sequence is generally used as a query to
search expressed sequence tags (ESTs) or cDNA databases of any organism of interest through BLAST search
(Altschul, Gish, Miller, Myers, & Lipman, 1990). Results of tag counts and tag annotation are then combined finally
into a gene expression profile. Gene expression profiles are then compared of two samples that are treated differently,
then we will be able to tell which gene is up- or downregulated in response to the particular treatment. In short, follow-
ing are the steps to the SAGE procedure:
G mRNA of an input sample (e.g., a tumor) isolated.
G Remove a small portion of sequence of mRNA molecule which is used for analysis.
G Link these small sequences together to form a longer chain or concatamer.
G Clone these chains into a vector which can be taken up by bacteria.
G Then sequence the chains with the help of high-throughput sequencer.
G Processing of data to count the small sequence tags with the help of a computer.
USAGE, a web-based application which comprises a set of tools to compare and analyze SAGE data. USAGE is
accessible at http://www.cmbi.kun.nl/usage free of cost for academic institutions. In addition, it enhances the functional-
ity and flexibility of data (Van Kampen et al., 2000). Some of the SAGE databases are:
1. SAGE net: This is the database known as SAGEnet (http://www.sagenet.org) which is maintained by the
Vogelstein/Kinzler Lab at Johns Hopkins. It is used mainly for colon cancer, pancreatic cancer, and some normal
tissues of these cells.
2. SAGEmap: This is developed by National Institute of Health’s (NIH) National Centre for Biotechnology
Information (NCBI) and NIH’s Cancer Genome Anatomy Project (CGAP). This database is considered as a public
gene expression repository and unique in many ways.
3. Genzyme’s SAGE database: Database is used to create SAGE tag libraries for contracting parties. This database is
also available through other agencies such as Celera Genomics and Compugen.
Besides this, few other SAGE analysis tools are available such as SAGE300. The SAGE data is obtained with the
help of sequencing the short DNA tags, although data may have errors due to sequencing (Tuteja & Tuteja, 2004).
3.3.1 Advantages of serial analysis of gene expression

1. SAGE studies may be proved to be an effective tool in human cancer studies with the help of the gene expression
profile studies from cancer and normal tissue of interest. A large number of genes recognized as tumor-specific
genes. Northern blot analysis has been done to confirm the differential expression of related gene (Yamamoto et al.,
2001).
2. SAGE technique is very much helpful in the areas such as cardiovascular biology, stem cell biology, cardiovascular
development, angiogenesis, atherosclerosis, and lipid regulation. It is mainly due to the electronic nature of SAGE
databases. Direct comparison of libraries may be done by different investigators. CGAP genome annotation initia-
tive may be used for gene expression queries regarding human heart SAGE library (Patino et al., 2002).
3. SAGE analysis may be done in immunological studies for human monocytes, macrophages, and their differentiated
descendants. By comparing the SAGE profiles of related cells, it was discovered that granulocyte macrophage-colony
stimulating factor (GM-CSF)-induced and M-CSF responsible macrophages expereed comparable sets of genes and
expressed similar sets of genes, implying functional similarity (Chen, Centola, Altschul, & Metzger, 1998).
3.3.2 Drawbacks of serial analysis of gene expression technique

1. It does not compute the authenticity of expression level of a gene.
2. The size of a tag obtained after SAGE analysis is 10 bases, making it difficult to assign a tag to a specific transcript
with accuracy.
3. Two different genes could have the same tag and the same gene that is alternatively spliced could have different
tags at the 30 ends.
4. The mRNA transcript allocated with each tag could be made even more arduous and uncertain on interpolating the
sequencing errors into the process.
3.4 DNA microarray

DNA microarrays comprise various microscopic DNA spots (probes) confined to a solid surface, namely, glass or a silicon
chip or microscopic beads (Illumina). Under high stringency conditions, from any sample of interest, single-stranded DNA
that is labeled or antisense RNA fragments are hybridized to the DNA microarray. DNA microarray pinpoints the probe
using its location revealing the amount of hybridization detected which is equivalent to the level of nucleic acids from the
commensurating location among the original sample in genome (Bunnik & Roch, 2013) (Fig. 3.2).
3.4.1 Applications of microarray

1. Microarray aids in examining the huge amount of former or current samples. Also, it has been proved to be effica-
cious in estimating the role of a certain marker in tumors.
2. DNA microarray analyzes the whole bacteria genome viability using a small amount of DNA as there is an immense
increase in resistant bacteria leading to casual infections causing failure of antibiotics (Govindarajan et al., 2012).
3. Drug target characterization, identification, and selection.
4. Cellular response to bacterial infection.
5. It diagnoses the presumed genetic disease by testing the existence of mutations.
3.4.2 Drawbacks of microarray

1. DNA microarray traces various samples simultaneously but it is a complicated procedure.
2. Despite of being a popular technology working for more than thousands of genes, it requires proficiency and skills
for data normalization and analysis.
3. Also, the technique works for only predefined sequences.
4. The technique is based on hybridization but it necessitates the high-power computing facilities.
FIGURE 3.2 Schematic representation of steps of microarray.

3.4.3 Bioinformatics tools for microarray data analysis

The data coolected by the microarray experiment generates extremely huge files, which are examined for the results. In
order to make the process easier, a variety of software has been developed. The Affymetrix GenChip platform, for
example, is one of the most widely used software for studying gene expression. Following are the software for
Affymetrix data analysis.
3.4.3.1 GeneChip Operating Software

It works in hardware management, image analysis, expression assessment, and data normalization. Also, it performs the
normalization and estimates quality control parameters.
3.4.3.2 Affymetrix Expression Console Software

The software summarizes the probe sets with enumerating and normalizing the expression arrays of gene chips. The
software is rigged with Microarray Suite 5.0 (MAS5) normalization algorithm, Probe Logarithmic Intensity Error
Estimation normalization, and Robust Multichip Analysis (RMA) normalization.
Moreover, following are some of the free software for academic use:
RMA Express: Robust Multichip Average, a program used to assess the gene expression summary values for
Affymetrix GeneChip. The software is free for academic use and can be downloaded from http://rmaexpress.bmbolstad.
com. RMA normalization can also be performed using R (http://www.r-project.org) and Bioconductor (http://www.bio-
conductor.org).
dCHIP: Initially, Cheng Li and Wing Hung Wong evolved the DNA-Chip Analyzer (dCHIP) by executing a model-
based expression analysis for Affymetrix gene expression arrays. The Affymetrix raw data (dat and cel files) and pro-
cessed data (quantified expression values as a tab delimited file) could easily be processed by this software. A large
data analysis such as SNP array, exon array, and tilling arrays can also be done.
The other features of the software are normalization and quality control, hierarchical clustering and comparison of
samples.
Few other software which are easy and free to access are SNOMAD (web-based tool), TM4 (Spotfinder, Microarray
Data Analysis System), Genesis, Gene Expression Model Selector, etc. (Mehta & Rani., 2011).
MIAME (minimum information about a microarray experiment): In order to report the microarray experiments,
FGED society fabricated this standard that specifies the required information for elucidating the experiment results evi-
dently. More precisely, it illustrates the required information to certify the interpretation of microarray data at ease lead-
ing to the development of data analysis tools. Various public databases such as ArrayExpress and Stanford Microarray
Database are storing gene expression data using the MIAME standard, including the Gene Expression Omnibus. These
databases in this age dispense some additional facilities for data analysis and annotation purposes (Brazma et al., 2001;
Kremer et al., 2001).
3.5 Next-generation sequencing technologies

The three most prominent and foremost next-generation sequencing (NGS) platforms, namely, Roche 454 platform
(Roche Life Sciences), the Applied Biosystems SOLiD platform (Applied Biosystems), and Illumina (previously known
as Solexa) Genome Analyzer, and HiSeq platforms (Illumina), are used at large scale.
3.5.1 Illumina sequencing

Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris innovated this technique at first. Although, Shankar
Balasubramanian and David Klenerman of Cambridge University established this and consequently founded Solexa, a
company later acquired by Illumina. The method is based on the ability of single-dye terminators to identify the single
bases when introduced into DNA strands. Reversible termination sequencing technology is a sequencing-by-synthesis
approach that concludes the template sequence by stepwise primer elongation. On Illumina platform, it is generalized
as a second-generation sequencing technology.
Ion Torrent sequencing is based on the detection of hydrogen ions that are released during the polymerization of
DNA and sequence DNA based on a semiconductor chip that is released in February 2010. Also, it is named as Ion
Torrent sequencing, pH-mediated sequencing, silicon sequencing, or semiconductor sequencing.
3.5.1.1 Cost of sequencing full genome

1. In June 2009 Illumina announced Personal Full Genome Sequencing Service at $48,000 per genome.
2. In November 2009 Complete Genomics sequences a complete human genome for $1700.
3. In May 2011 Illumina lowered its Full Genome Sequencing service to $5000 per human genome, or $4000 if order-
ing 50 or more.
4. Several companies, namely, Life Technologies in January 2012, Oxford Nanopore Technologies in February 2012,
and Illumina in February 2014, started to claim that as the cost of sequencing begins to decline, their equipment
will achieve $1000.
3.5.2 Applications of next-generation sequencing

1. The exact order of nucleotide occurrence in DNA could be attained by sequencing methods. The genetic information
can be elucidated from any biological system using DNA sequence. F. Sanger in 1975 developed the Sanger
sequencing method which was the first generation method of sequencing to be developed. There were certain limita-
tions to the method inherent in nature regarding throughput, speed, scalability and its resolution, second-generation
of sequencing method, or NGS developed in order to fulfill the uprising demand of a sequencing method which is
cheaper as well as faster in technology.
2. Principally, the basic idea behind NGS is based on the sequencing of thousands of fragment of DNA using a single
sample, also known as massive parallel sequencing. It allows the large stretch of DNA base pairs to be sequenced
which in results produces hundreds of gigabases of data in single sequential run.
3. The third-generation sequencing method has been developed but it is not as mature as the second-generation
sequencing method (Hayden, 2009), therefore being infant, it could not be widely accepted till now, but the NGS
methods really are.
4. Molecular biology: NGS plays a vital role in molecular biology while studying the whole genome and encoded pro-
teins. The information retrieved regarding changes in genes and their alliance and affiliation with various diseases
and phenotypes helps researchers to learn. Also, it helps in identification of drug targets.
5. Evolutionary biology aids in estimating the correlation between the organisms and their development.
6. Medicine: The presence of any genetic disease-related risk could be decided, if any, using sequencing methods by
the medical technicians.
7. Forensics: The use of DNA sequencing methods has been established in DNA profiling and paternity tests in field
of identification of forensics. Various samples such as fingerprints of any organism, hairs, saliva, etc. are used as
samples in estimating the different separating DNA patterns which is the basis of identification. A certain unique
pattern using a single strand could be produced by detecting specific genome as each and every living organism
comprises a unique DNA and could be determined via DNA testing. No two individual shares the exact similar
DNA pattern, if any, a rare case.
However, NGS methods are much capable as they cope up with the traditional methods (Sanger sequencing)
by providing a faster alternative to them. NGS ensures to be very fast as a whole genome in a single day could
be sequenced by researchers. For example., Illumina, which costs less than $5000 per genome could sequence more
than five human genomes in a single run, resulting into generation of data within a week. The genes including their
regulatory pathways associated with diseases could be determined by using high-throughput sequencing (HTS)
method.
Exome sequencing reveals the disease-related variations and mutations in exome region. It helps to determine the
coding regions of protein within the genome.
Targeted resequencing computes the level of sequencing among the genomic region of interest. Being a small subset
such as exome, an advantage using targeted resequencing is that it does not involve higher sequencing cost.
Chromatin immunoprecipitation sequencing (ChIP-Seq): The interaction among protein, DNA and RNA is analyzed
using this method. It enables the identification of the binding sites of the DNA associated proteins. Also, it interprets
various regulation events such as gene regulation, DNA repair, and DNA synthesis.
RNA sequencing (RNA-seq): It is a transcriptome sequencing approach which comprises functions such as transcript
analysis and detection with low expression levels and with or without reference sequence, respectively. Moreover, the
method is found to be more precise in quantifying the exact expression levels.
3.5.3 Bioinformatics tools for next-generation sequencing

TopHat: It is an open-source software which helps in the alignment of reads among RNA-seq to the reference genome.
It does not rely over the splice sites (Lee et al., 2012).
Bambino: It is a viewer for next-generation sequence files (Edmonson, Zhang, & Yan, 2011).
Tablet: It is Java based and available for Linux, OSX, Windows, and Solaris platforms, in both 32- and 64-bit ver-
sions. It provides a sequence level as well as contig overview. Also, it is more capable in highlighting the disagreements
among the reference or consensus sequence in the mapped reads.
The Integrative Genomics Viewer: It is an open-source visualization tool (http://www.broadinstitute.org/igv/) which
aids to explore huge scale of data sets of genome. A variety of array-based data have been supported, namely, expres-
sion and copy-number arrays, RNA interference screens, methylation, genomic annotations, and gene expression.
The Savant (Sequence Annotation, Visualization and ANalysis Tool) Genome Browser: It is an open-source desktop
visualization and analysis browser developed for visualizing and analyzing genomic data, including the HTS data, for
example, NGS, with low memory requirements.
Magic Viewer: It was evolved to align short read visualization and annotation.
Geneious: It is an analysis tool to visualize sequence and a number of operations applied for visualizing and manip-
ulating next-generation sequence data. It also provides tools for the assembly, alignment, and annotation of genomic
reads and sequence with exploratory alignment against public repositories using the BLAST sequence search capability.
Mass spectroscopy: Orbitrap is the most forward mass spectrometer available till date with a high resolution, a high
mass accuracy, and a large dynamic range, making it convenient to be applied to the proteomic and metabolomic
applications.
3.6 Databases and genome annotation

Genome annotation is based on the assessment of functional elements among the genomic sequence. The sequencing of
DNA leads to produce the sequences of unknown function (Abril & Castellano Hereza, 2019). Genome annotation
results into the determination of the function of the product of a predicted gene via in silico method. For this to happen,
several necessary features of bioinformatics software must include (1) signal sensors (e.g., for TATA box, start and stop
codon, or poly-A signal detection); (2) content sensors (e.g., for G 1 C content, codon usage, or dicodon frequency
detection); and (3) similarity detection (e.g., between proteins from closely related organisms, mRNA from the same
organism, or reference genomes) (de Sá et al., 2018). Biological databases fulfill the requirements.
3.6.1 Biological databases

The biological databases fall under different categories: (1) DNA, (2) RNA, (3) protein, (4) expression, (5) pathway, (6)
disease, (7) nomenclature, (8) literature, (9) standard, and (10) ontology (Zou, Ma, Yu, & Zhang, 2015) (Fig. 3.3).
On the basis of source, there are two types of database: primary and secondary.
FIGURE 3.3 Types of biological databases. Adapted from NCBI.

3.6.1.1 Primary database

The primary databases contain biomolecular data in its original form. EMBL, GenBank, DDBJ, SWISS-PROT,
TREMBL, and PIR constitute the primary databases.
3.6.1.1.1 DNA databases

GenBank is one of the representative of DNA databases as of December 2014, comprising over 184 billion nucleotide
bases in 179 million sequences or more. DNA databases establish the reference genome (e.g., NCBI RefSeq), human
genetic variation profiling (e.g., dbSNP), and association of genotype with phenotype (e.g., EGA) and help to identify
the human microbiome metagenomes (e.g., IMG/HMP) (Zou et al., 2015). The human DNA databases assemble the ref-
erence genome (e.g., NCBI RefSeq) and human genetic variation profiling (e.g., dbSNP) and associates the genotype
and phenotype together (e.g., EGA) and microbiome metagenomic identification of humans.
EMBL: It was established by collaboration of GenBank and DDBJ.
DDBJ: DNA Data Bank of Japan used to collect DNA sequences.
SWISS-PROT: It is a protein database that consists of about 547,357 proteins annotated manually in January 2015
and aids in providing minimum redundancy and higher integration with other databases. protien data bank (PDB)
(established in 1971) as determined by X-ray crystallography and numclear magenatic resonance (NMR) is the other
example of protein database for determining 3D structures of biological macromolecules. As of December 30, 2014,
PDB comprises 105,465 biological macromolecular structures where 27,393 entries belong to human.
3.6.1.1.2 RNA databases

For decoding ncRNAs, the human RNA databases are constructed (e.g., GENCODE) (Consortium, 1., 2012), specifi-
cally lncRNAs attracting the current interest (e.g., LncRNAWiki). RNA central is one of the representative examples of
RNA database. It avails the unified access to the ncRNA sequence data supplied by various number of multiple data-
bases such as Rfam, lncRNAdb, and miRBase. (http://rnacentral.org) (Table 3.1).
3.6.2 Functional genomic databases

These databases provide information about the functions of genes for example., Databases used for information
retrieval system, that is, BLAST, commonly used by the scientist for predicting and analyzing the information regard-
ing function of new or unknown genes. The foremost dedicated genomic databases are described in the following
sections.
TABLE 3.1 The biological information and the type of source.
S. no. Type of information Source

1. Nucleotide sequence GenBank (http://www.ncbi.nlm.nih.gov/genbank/)
EMBL (http://www.ebi.ac.uk/embl/)
DDBJ (http://www.ddbj.nig.ac.jp)
2. Nonredundant EST sequence UniGene (http://www.ncbi.nlm.nih.gov/unigene)
TIGR Gene Indices (http://www.tigr.org/tdb/tgi)
3. Protein sequence and annotation Uniprot (http://www.uniprot.org/)
4. Protein structure PDB, (http://www.rcsb.org/pdb/)
5. Metabolic pathway KEGG (http://www.genome.ad.jp/kegg/)
6. Gene expression (cDNA microarray) data GEO (http://www.ncbi.nlm.nih.gov/geo/)
ArrayExpress (http://www.ebi.ac.uk/arrayexpress/)
SMD (http://smd.princeton.edu/)
7. Database of essential genes for prokaryotes and eukaryotes DEG (http://tubic.tju.edu.cn/deg/)
EST, Expressed sequence tag.

Source: Adapted from Katara, P. (2014). Potential of Bioinformatics as functional genomics tools: an overview. Network Modeling Analysis in Health
Informatics and Bioinformatics, 3, 52.
3.6.2.1 Rice functional genomics

KOME database (Knowledge-based Oryza Molecular Biological Encyclopedia) gathers about 38 000 full-length
cDNAs of japonica cv. Nipponbare. A number of 10,081 and 12,727 full-length cDNA sequences from Gaungluai 4
and Minghui 63, respectively, comprised by the rice indica cDNA database (RICD) database. Affymetrix GeneChip
Rice Genome Array examines the expression profiles in various stressfull conditions in elite hybrid rice Shanyou 63
and its parents Zhenshan 97 and Minghui 63, present in the information platform of Collection of Rice Expression
Profiles (CREP). The comparison between transcriptomes of super hybrid rice LYP9 and its parental cultivars 9311
and PA64s was performed using gene expression microarrays. Affymetrix GeneChip Rice Genome Array determines the
quantitative trait loci (eQTLs) expression in rice seedlings and flag leaves during heading period using recombinant
inbred lines, which was developed by performing a cross between Zhenshan 97 and Minghui 63 (Wang et al., 2010;
Wei et al., 2014).
The functional genomics of rice research has enriched the resources with genes such as Xa21 and xa13 conferring
resistance to plants against rice bacterial leaf blight. Pigm and Bsr-d1 could also be used as a breeding source for dis-
ease resistance specifically to rice blast. Wild rice also consisting of a gene Bph 14 identified originally in Oryza minu-
ta for obtaining resistance against brown planthopper. The local varieties also contributed by developing various alleles,
such as brown plant hopper resistance gene BPH3, salt tolerance gene HKT2, submerge tolerance gene Sub1, and high-
temperature tolerance gene OsTT1. The genes have a huge potential for breeding in rice.
Some of the databases for the molecular plant are IC4R (http://ic4r.org/), RICD (http://202.127.18.221/ricd/index.
html), TIGR (http://rice.plantbiology.msu.edu/), IRRI(http://irri.org/), CREP (http://crep.ncpgr.cn/), etc. (Li et al., 2018).
3.6.2.2 Functional genomics in Malvaceae family plants

Several economically flowering plant species constitute the category such as cotton, cacao, and durian. Ma-Gen Db was
developed as a user-friendly database for decoding and as functional genomic hub for this plant community, available at
http://magen.whu.edu.cn. There is an availability of eight types of 367 deep-sequencing data for 13 species. The database
aids the generation of multiple dynamic charts and hyperlinks. All the functional annotations for gene, transcript, and pro-
tein displayed on a page are named as Genewiki. MaGenDB is a database where a total number of 374 processed omics
data of nine techniques with 18 types of annotation and more than 24 million functional elements are stored and conferred
in a user-friendly way using well-designed custom dynamic charts. In a concluding note, the database is filling out the gap
for a salient plant family and, thereby, generating an functional comparison system (Wang et al., 2020).
3.6.2.3 Functional genomics in fungi

Fungi database (available at http://FungiDB.org) is a functional genomic resource which was developed with the part-
nership with the NIAID-funded Eukaryotic Pathogen Bioinformatics Resource Centre (http://EuPathDB.org). The data-
base consisting of the genome sequence and annotation from 18 species from several classes, including Ascomycota,
Eurotiomycetes, Sordariomycetes, Saccharomycetes, and Basidiomycota, Pucciniomycetes and Tremellomycetes, and
the basal “Zygomycete” lineage Mucormycotina. FungiDB enlightens various functional genomic data sets (1) for
Aspergillus flavus, Aspergillus terreus, Aspergillus niger, and Gibberella moniliformis. EST is data retrieved from
dbEST (http://www.ncbi.nlm.nih.gov/dbEST/). (2) Based on different synchronization methods, cell cycle microarray
data is derived for Saccharomyces cerevisiae. (3) RNA-sequence data is derived from Rhizopus oryzae during hyphal
growth and (4) two hybrid yeast data are obtained from S. cerevisiae (Stajich et al., 2012).
Some other databases for the study of genome are AgBase database for functional genomic resource, available at
(http://www.agbase.msstate.edu/); for studying diversity among Rubiaceae family, MoccaDB database is available (http://
moccadb.mpl.ird.fr/); the other one, TFGD database is used for tomato functional genomic databases (http://ted.bti.cornell.
edu/); SFGD database is for soybean functional genomic database (http://bioinformatics.cau.edu.cn/SFGD/), etc.
3.7 Conclusion
The genomic data resulting from sequencing created various huge challenges as well as several opportunities to study
the genomes of organism. The bioinformatic tools mentioned in the present review article including databases and soft-
ware play an efficient role in handling out those challenges. Several functional genomic approaches with their databases
are mentioned to tackle the biological problems generating from the huge size of data. Although the functional genomic
databases are continuously updated with mined knowledge and new information in order to provide much more reliable
information for genomics-related analysis.
References
Abril, J. F., & Castellano Hereza, S. (2019). Genome annotation (pp. 195209). Elsevier.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215,
403410.
Bouchez, D., & Höfte, H. (1998). Functional genomics in plants. Plant Physiology, 118(3), 725732.
Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., . . . Roch, K. G. (2013). An introduction to Functional Genomics
and System Biology. Advances in wound care., 2(9), 490498.
Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C. A., Causton, H. C. &
Gaasterland, T. (2001). Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nature Genetics,
29(4), 365371.
Bunnik, E. M. & Le Roch, K. G. (2013). An introduction to functional genomics and systems biology. Advances in wound care, 2(9), 490498.
Chen, H., Centola, M., Altschul, S. F., & Metzger, H. (1998). Characterization of gene expression in resting and activated mastcells. The Journal of
Experimental Medicine, 188, 16571668.
Collins, F. S., & Fink, L. (1995). The Human Genome Project. Alcohol Health and Research World, 19(3), 190195.
Consortium, 1. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422), 5665.
de Sá, P. H., Guimarães, L. C., das Graças, D. A., de Oliveira Veras, A. A., Barh, D., Azevedo, V., . . . Ramos, R. T. (2018). Next-generation sequenc-
ing and data analysis: Strategies, tools, pipelines and protocols. Omics Technologies and Bio-Engineering (pp. 191207). Academic Press.
Edmonson, M. N., Zhang, J., Yan, C., et al. (2011). Bambino: A variant detector and alignment viewer for next-generation sequencing data in the
SAM/BAM format. Bioinformatics (Oxford, England), 27, 865866.
Govindarajan, R., Duraiyan, J., Kaliyappan, K., & Palanisamy, M. (2012). Microarray and its applications. Journal of Pharmacy & Bioallied Sciences,
4(Suppl 2), S310.
Hayden, E. C. (2009). Genome sequencing: the third generation. Nature, 457(7231), 768769.
Hidalgo, O. B. (2003). Functional genomics and bioinformatics: an overview. Biotecnologı´a Aplicada., 20(3), 183.
Katara, P. (2014). Potential of Bioinformatics as functional genomics tools: An overview. Network Modeling Analysis in Health Informatics and
Bioinformatics., 3, 52.
Khan, N. T. (2018). Structural and Functional Bioinformatics. Letters in Health and Biological Science, 3(1), 711.
Kremer, S., Stewart, J., Taylor, R., Vilo, J., & Vingron, M. (2001). Minimum information about a microarray experiment (MIAME) toward stan-
dards for microarray data. Nature Genetics, 29, 365371.
Lee, H. C., Lai, K., Lorenc, M. T., Imelfort, M., Duran, C., & Edwards, D. (2012). Bioinformatics tools and databases for analysis of next-generation
sequence data. Briefings in Functional Genomics., 11(1), 1224.
Li, Y., Xiao, J., Chen, L., Huang, X., Cheng, Z., Han, B., & Wu, C. (2018). Rice functional genomics research: past decade and future. Molecular
plant., 11(3), 359380.
Mehta, J. P., & Rani, S. (2011). Software and tools for microarray data analysis in Gene Expression Profiling (784, pp. 4153). Humana Press.
Patino, W. D., Mian, O. Y., & Hwang, P. M. (2002). Serial analysis of gene expression: technical considerations and applications to cardiovascular
biology. Circulation Research, 91(7), 565569.
Stajich, J. E., Harris, T., Brunk, B. P., Brestelli, J., Fischer, S., Harb, O. S., & Stoeckert, C. J., Jr (2012). FungiDB: an integrated functional genomics
database for fungi. Nucleic Acids Research, 40(1), 675681.
Tabata, S., Kaneko, T., Nakamura, Y., Kotani, H., Kato, T., Asamizu, E., Miyajima, N., Sasamoto, S., Kimura, T., Hosouchi, T. & Kawashima, K.
(2000). Sequence and analysis of chromosome 5 of the plant Arabidopsis thaliana. Nature, 408(6814), 823826.
Tuteja, R., & Tuteja, N. (2004). Serial analysis of gene expression (SAGE): unraveling the bioinformatics tools. Bioessays., 26(8), 916922.
Van Kampen, A. H., van Schaik, B. D., Pauws, E., Michiels, E. M. C., Ruijter, J. M., Caron, H. N., & van Der Mee, M. (2000). USAGE: A web-
based approach towards the analysis of SAGE data. Bioinformatics (Oxford, England), 16(10), 899905.
Velculescu, V. E., Zhang, L., Vogelstein, B., & Kinzler, K. W. (1995). Serial analysis of gene expression. Science (New York, N.Y.), 270, 484487.
Wang, D., Fan, W., Guo, X., Wu, K., Zhou, S., Chen, Z., . . . Zhou, Y. (2020). MaGenDB: a functional genomics hub for Malvaceae plants. Nucleic
Acids Research., 48(1), 10761084.
Wang, L., Xie, W., Chen, Y., Tang, W., Yang, J., Ye, R., Liu, L., Lin, Y., Xu, C., Xiao, J., et al. (2010). A dynamic gene expression atlas covering
the entire life cycle of rice. The Plant Journal: for Cell and Molecular Biology, 61, 752766.
Wang, S. M. (2004). Understanding SAGE data. Trends in Genetics., 23(1), 4250.
Wei, L., Gu, L., Song, X., Cui, X., Lu, Z., Zhou, M., Wang, L., Hu, F., Zhai, J., Meyers, B. C. ,, et al. (2014). Dicer-like 3 produces transposable
element-associated 24-nt siRNAs that control agricultural traits in rice. Proceedings of the National Academy of Sciences of the United States of
America, 111, 38773882.
Yamamoto, M., Wakatsuki, T., Hada, A., & Ryo, A. (2001). Use of serial analysis of gene expression (SAGE) technology. Journal of Immunological
Methods, 250(12), 4566.
Zou, D., Ma, L., Yu, J., & Zhang, Z. (2015). Biological databases for human research. Genomics, proteomics & bioinformatics., 13(1), 5563.

Chapter 3 - Bioinformatics Intervention in Functional Ge - 2022 - Bioinformatics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3 - Bioinformatics Intervention in Functional Ge - 2022 - Bioinformatics

Uploaded by

Copyright:

Available Formats

Bioinformatics intervention in functional

3.2 Functional genomic approaches

3.3 Serial analysis of gene expression

FIGURE 3.1 A SAGE procedure.

3.3.1 Advantages of serial analysis of gene expression

3.3.2 Drawbacks of serial analysis of gene expression technique

3.4 DNA microarray

3.4.1 Applications of microarray

3.4.2 Drawbacks of microarray

FIGURE 3.2 Schematic representation of steps of microarray.

3.4.3 Bioinformatics tools for microarray data analysis

3.4.3.1 GeneChip Operating Software

3.4.3.2 Affymetrix Expression Console Software

3.5 Next-generation sequencing technologies

3.5.1 Illumina sequencing

3.5.1.1 Cost of sequencing full genome

3.5.2 Applications of next-generation sequencing

3.5.3 Bioinformatics tools for next-generation sequencing

3.6 Databases and genome annotation

3.6.1 Biological databases

FIGURE 3.3 Types of biological databases. Adapted from NCBI.

3.6.1.1 Primary database

3.6.1.1.1 DNA databases

3.6.1.1.2 RNA databases

3.6.2 Functional genomic databases

TABLE 3.1 The biological information and the type of source.

S. no. Type of information Source

EST, Expressed sequence tag.

3.6.2.1 Rice functional genomics

3.6.2.2 Functional genomics in Malvaceae family plants

3.6.2.3 Functional genomics in fungi

You might also like