You are on page 1of 13

Transcriptomics

Transcriptomics is expression analysis of populations of genes, it is detection of differentially


expressed genes (under different conditions, treatments, developmental stages and stress).
Transcriptome is set of all RNA molecules including mRNA, tRNA, and other non-coding RNA
produced in one or a population of cell which are present in certain cell, tissue, organ etc. It is the
mRNA level results from intensity of transcription and mRNA stability.Transcriptomics technologies
are the techniques used to study an organism’s transcriptome, the sum of all of its RNA transcripts.
History: Before Transcriptomics
Studies of individual transcripts were being performed several years before any
transcriptomics approaches were available. Quantification of individual transcripts by
northern blotting, and later Reverse quantitative PCR (RT-qPCR) were popular, but these
methods are laborious and can only capture a tiny subsection of a transcriptome.
Consequently, the manner in which a transcriptome as a whole is expressed and regulated
remained unknown until high-throughput techniques were developed. The explosion in
transcriptomics has been due to the rapid development of new technologies with an improved
sensitivity and economy.
Northern Blotting RT-qPCR

History of Transcriptomics
The word “transcriptome” was first used in the 1990s. In 1995, one of the earliest sequencing-based
transcriptomic methods was developed, serial analysis of gene expressions. (SAGE). The 1st attempt at
capturing a partial human transcriptome was published in 1991 and reported 609 mRNA sequences
from human brain. In 2008, 02 human transcriptomes' composed of millions of transcript-derived
sequences covering 16,000 genes were published. This explosion in transcripteomics has been driven
by rapid development of new technologies with improved sensitivity and economy. Transcriptomes' of
different disease states, tissues, or even single cells are now routinely generated.
The two dominant techniques, microarrays and RNA-Seq, were developed in the 2000s. Now
transcriptomes of different disease states, tissues, or even single cells are routinely generated.
Transcriptomics uses 2 technologies:
1. Microarrays
2. RNA Sequencing (RNA-Seq)

1
Gene expression

2
RNAs in Transcriptome
Not all RNAs are translated into proteins (Non-coding), some serve a structural function, for example,
rRNAs in the assembly of ribosomes, Transporters, e.g., tRNAs, yet others Regulatory functions, for
example, siRNAs (short interfering RNA), or lncRNAs (long non-coding RNAs); these are not
translated into proteins. However, these non-coding RNAs can and often do play roles in human
diseases such as cancer, cardiovascular, and neurological disorders. While transcriptomics is most
commonly applied to the mRNAs, the coding transcripts. Transcriptomics also provides important
data regarding content of the cell non-coding RNAs, including rRNA, tRNA, lncRNA, siRNA, and
others.
Basics of Transcriptomics
The study of the transcriptome provides an overall picture of the mechanism and the variation of the
RNA in cells or tissues. In gene expression studies, levels of different mRNAs are quantified. This is
one of the most widely used methods to gain insight into the biological activity of a cell or a tissue at
any given time. Other types of RNA molecules that are involved in regulating the transcription and
translation mechanisms can be also quantified. Widespread use of transcriptome sampling strategies is
a complementary approach to genome sequencing.
Transcriptome map or gene expression atlas
Transcriptome map was defined as a global estimation of gene expression in all possible cells, tissues,
organs, or parts of an organism during the life cycle of the organism from embryogenesis to
senescence.

Recent studies have revealed that about 90% of the eukaryotic genome is transcribed.
Interestingly, only 1–2% of these transcripts encode for proteins; the majority are transcribed
as ncRNAs. The transcriptome of an organism consists of:
1. mRNA 2. tRNA 3. rRNA
Non-coding RNA (ncRNA):
 Micro RNA (miRNA),
 Small interfering RNA (siRNA),
 long noncoding RNA (lncRNA)
 Natural antisense short interfering RNA (natsiRNA)
 Promotor associated short RNA (PasRNAs)
 Piwi-interacting RNAs (piRNAs)
 Enhancer RNAs (eRNAs) and
 promoter-associated RNAs (PARs)

3
4
Microarrays
Microarrays consist of short nucleotide oligomers, known as "probes“, which are arrayed on a solid
substrate (e.g., glass). Transcript abundance is determined by hybridization of fluorescently labelled
transcripts to these probes. The fluorescence intensity at each probe location on the array indicates the
transcript abundance for that probe sequence.
Following is brief steps:
1. Within the organisms, genes are transcribed and spliced (in eukaryotes) to produce mature
mRNA
transcripts (red in next slide).
2. The mRNA is extracted and copied into stable double-stranded cDNA (blue in next slide).
3. Then, the ds-cDNA is fragmented and fluorescently labelled (orange in next slide).
4. The labelled fragments bind to an ordered array of complementary oligonucleotides, and
measurement of fluorescent intensity across the array indicates the abundance of a
predetermined set of sequences.

Limitations of Microarrays:
Microarray analysis is limited to the detection of known transcripts
In fact, microarrays require some prior knowledge of the organism of interest, for example, in
the form of an annotated genome sequence or in a library of ESTs that can be used to generate
the probes for the array.

Other limitations:
Poor sensitivity
Low specificity
A limited dynamic range
RNA-Sequencing (RNA-Seq)
RNA-Seq refers to the combination of a high-throughput sequencing methodology with
computational methods to capture and quantify transcripts present in an RNA extract. RNA-
Seq may be used to identify which genes are active at a particular point in time. RNA-Seq
works by sequencing every RNA molecule and profiling the expression of a particular gene
by counting the number of time its transcripts have been sequenced The summarized RNA-
Seq data is widely known as count data

5
RNA Sequencing Process
1. Within the organisms, genes are transcribed and spliced (in eukaryotes) to produce mature
mRNA transcripts (red).
2. The mRNA is extracted from the organism, fragmented and copied into stable double-
stranded cDNA (blue).
3. The ds-cDNA is sequenced using high-throughput and short-read sequencing methods..
4. These sequences can then be aligned to a reference genome sequence to reconstruct which
genome regions were being transcribed.
These data can be used to annotate where expressed genes are, their relative expression
levels, and any alternative splice variants.
(Next Slide)

6
Uses of RNA Sequencing
Evaluate absolute transcript level of sequenced and unsequenced organism. Detect novel
transcripts and isoforms. Map exon/intron boundaries, splice junctions. Analyse alternative
splicing. Reveal sequence variations (e.g. SNPs) and splice variants.

7
Outcome of RNA Sequencing
Sequencing Depth (Library size): Total number of reads map to the genome. Gene Length: Total
number of bases that gene has. Gene Count: Number of reads mapping to the gene (Expression
measurement.

8
Types of RNA seq technologies
RNA-Seq, also called RNA sequencing, is a particular technology-based sequencing technique which
uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological
sample at a given moment, analysing the continuously changing cellular transcriptome.
1.The first NGS Technology for RNA Seq was introduced by Roche in 2004, based on
“pyrosequencing” technology. Approximately 1.000 bp average length sequence reads can be produced
from the latest Roche 454 GS FLX+ system.
2. Sequencing by oligo ligation and detection (SOLID) technology, released in 2007, uses a a
sequencing chemistry catalyzed by DNA ligase and produces an average read length of 75 bp.

9
Gene Expression Maps in Plants
The era of plant-wide transcriptomics started in 2005 when 1st detailed transcriptome map was
constructed for Arabidopsis thaliana [1]. Arabidopsis is a classical object in plant biology and the 1st
plant with a sequenced genome [2,3]; the next stage of large-scale studies on A. thaliana was
construction of the Aymetrix ATH1 microarray. The transcriptome map of Arabidopsis thaliana [1]
became a milestone in plant transcriptomics and shaped the main areas of investigation for subsequent
atlases.
The growing body of genomic data includes both model objects (model grass Brachypodium
distachyon, Capsella bursa-pastoris [4,5] and Agricultural plants i.e. B73 maize, rice genome (Oryza
sativa L. ssp. japonica), rice genome (Oryza sativa L. ssp. indica), Medicago and soybean [6-10], as
well as many other species.
Progress in sequencing and assembly technologies allowed creation of detailed transcriptome
maps using both microarray and RNA-seq data. At least 43 transcriptome maps thoroughly
describe 32 plants, including many model objects or agricultural plants; therefore, reanalysis
of the existing atlases can be of use in a variety of studies. In addition, 40 transcriptome maps
are a drop in the ocean of plant biodiversity, encompassing more than 350,000 species only in
Angiosperms and more than 30,000 species of non-seed plants (2016). Thus, several families
have been well studied, many model and agricultural plants are already covered. More than
one transcriptome map has been constructed for several species (for instance, Arabidopsis,
sorghum, or rice), reflecting the common practice of updating the previous datasets, while for
novel transcriptomic data, one should look outside popular families.
Gene Expression Maps in Plants
A transcriptome map is a powerful source of functional information. It is the result of the
genome-wide expression analysis of a broad sampling of tissues and/or organs from different
developmental stages and/or environmental conditions. In plant science (Klepikova, 2019),
the application of transcriptome maps extends from the inference of gene regulatory networks
to evolutionary studies. However, only some of these data have been integrated into
databases. So enabling analyses to be conducted without raw data; without this integration,
extensive data preprocessing is required, which limits data usability.
Comparing transcriptome data between plants using a network module finding algorithm (Lee et
al, 2019)
Expression analysis is commonly used to understand the tissue or stress specificity of genes
in large gene families. The goal of comparative transcriptome analysis is to identify
conserved co-expressed genes in 02 or more species. The traditional definition of orthologous
genes is based solely on sequence homology and syntenic (physical co-localization of genetic
loci on the same chromosome) relationships and not at all on gene expression patterns. In
contrast, comparative transcriptome analysis combines a comparison of gene sequences with
a comparison of expression patterns between homologous genes in different species.
Homologous genes have been reported to be expressed at different developmental stages, in
different tissue types, or under different stress conditions.
Comparative transcriptome analysis is an important tool for distinguishing those genes that have
retained functional conservation from those that have undergone functional divergence.
Comparative transcriptome analysis is particularly important for plant research, since most
molecular mechanistic studies in plants have been performed in model species, primarily
Arabidopsis thaliana. Most of functional annotation of the genes of many other plant species
relies solely on sequence comparisons with Arabidopsis.
Steps involved Comparative Transcriptome
To compare transcriptomes between any 02 species, steps are following:
1. Establish homologous relationships between proteins in 02 species.
2. Identify expression data obtained from experiments that are performed under similar
conditions or tissue types.

10
3. Compare the expression patterns between the 02 data sets. Example: In this protocol, we
will compare published time course seed embryo expression data from Arabidopsis with data
from same tissue in soybean as a demonstration of how to apply computational tools to
comparative transcriptome analysis.
 2nd way Guide to comparitive Transcriptomes
In contrast with time course data examined described earlier, many other data sets have been reported
from “treatment–control” experiments (one time point only and 02 treatment conditions). For example,
soybean roots were treated with drought stress in one experiment. To address the question of functional
conservation versus functional divergence within gene families, these soybean root data can be
compared with transcriptome data from Arabidopsis roots, under a similar stress. This is a relatively
simple problem, because, in both experiments, we can identify lists of differentially expressed genes in
response to the same or similar treatments.
It was a simple two-step process to identify conserved co-expressed genes for treatment–
control experiments. First, one needs to identify a list of gene pairs that are homologous
between these 02 species. A simple BLAST search or other more sophisticated approaches,
such as OMA, EggNog, or Plaza, can be used to identify homologous genes. Second, 02 lists
of differentially expressed genes can be compared to find whether any pairs of these
homologous genes appear in both lists. In a more complex scenario: two time-series
experiments have been performed for the same developmental process in 02 different species.
Time course data provide more data points than simple treatment–control experiments.

1. Identification of ortholous pairs between 02 species using BLAST


2. RNA-seq analysis to get co-expression networks, and running OrthoClust to cluster genes
with orthologous relations.
3. Blue fonts indicates softwares or scripts used in this workflow.
Recently steps of two time-series experiments (2019)
 Obtaining reciprocal best hit (RBH) genes
 Co-expression networks
 OrthoClust analysis
 Visualization of OrthoClust results as a network
 Visualization of OrthoClust results as expression profiles
 Effect of different parameters in OrthoClust analysis

11
Fig. 3 (Above slide) Visualization of module 8 from OrthoClust result. In this network, Circle 1 and 4
stand for groups of genes from Arabidopsis and soybeans that do not have orthology in the other
species and only co-expression partner from the same species. Circle 2 and 3 denote genes have
orthologous partner in the other species as well as their co-expression partners from the same species.
Green nodes are genes from Arabidopsis, and red from soybean. Edges from co-expression network of
Arabidopsis are green, and those of soybeans are red. Black double lined edges indicate homologous
pairs between soybean and Arabidopsis genes. Four genes from Raffinose biosynthesis pathways are
highlighted in blue colour and their homologous pairs have thicker edges.
Applications of Transcriptomics in Plant Breeding
1. Transcriptome assembly and profiling results in a large collection of expressed sequence
tags (ESTs) for almost all the important plant species. The plant EST database has recently
passed the 05 million sequence landmark. More than 50 plant species, each with >5000 ESTs,
are represented.

12
2. Characterization of all types of RNAs that helps to explain role in development, genome
maintenance and plant responses to environmental stresses. In addition post-transcriptional regulatory
role, influence de novo methylation or other modifications to silence genes
3. Prevalence of plants predated the development of NGS (Next generation sequencing).
4- eQTL (Expression quantitative trait loci)
Protein and transcript profiles can also be directly mapped onto a segregating population to
provide information on loci that control gene expression levels. The QTLs associated with
such traits are known as expression (eQTL), protein (pQTL) or metabolite (mQTL).
Expression quantitative trait loci (eQTL) analyses identify single nucleotide polymorphisms
(SNPs) that are associated with the expression level of a gene. A gene-SNP pair such that the
expression of the gene is associated with the value of the SNP is referred to as an eQTL.
Protein modification or levels of a particular secondary metabolite.

References
1. Schmid, M.; Davison, T.S.; Henz, S.R.; Pape, U.J.; Demar, M.; Vingron, M.; Schölkopf, B.; Weigel,
D.; Lohmann, J.U. A gene expression map of Arabidopsis thaliana development. Nat. Genet. 2005, 37,
501–506.
2. Meinke, D.W.; Cherry, J.M.; Dean, C.; Rounsley, S.D.; Koornneef, M. Arabidopsis thaliana: A
model plant for genome analysis. Science 1998, 282, 662, 679–682.
3. Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis
thaliana. Nature 2000, 408, 796–815.
4. International Brachypodium Initiative. Genome sequencing and analysis of the model grass
Brachypodium distachyon. Nature 2010, 463, 763–768. [CrossRef]
5. Kasianov, A.S.; Klepikova, A.V.; Kulakovskiy, I.V.; Gerasimov, E.S.; Fedotova, A.V.; Besedina,
E.G.; Kondrashov, A.S.; Logacheva, M.D.; Penin, A.A. High-quality genome assembly of Capsella
bursa-pastoris reveals asymmetry of regulatory elements at early stages of polyploid genome evolution.
Plant Journal, 2017, 91, 278–291.
6. Schnable, P.S.; Ware, D.; Fulton, R.S.; Stein, J.C.; Wei, F.; Pasternak, S.; Liang, C.; Zhang, J.;
Fulton, L.; Graves, T.A.; et al. The B73 maize genome: Complexity, diversity, and dynamics. Science
2009, 326, 1112–1115.
7. Gon , S.A.; Ricke, D.; Lan, T.H.; Presting, G.; Wang, R.; Dunn, M.; Glazebrook, J.;
Sessions, A.; Oeller, P.; Varma, H.; et al. A draft sequence of the rice genome (Oryza sativa
L. ssp. japonica). Science 2002, 296, 92–100. [CrossRef] [PubMed]
8. Yu, J.; Hu, S.;Wang, J.;Wong, G.K.; Li, S.; Liu, B.; Deng, Y.; Dai, L.; Zhou, Y.; Zhang, X.;
et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 2002, 296,
79–92. [CrossRef] [PubMed]
9. Young, N.D.; Debelle, F.; Oldroyd, G.E.; Geurts, R.; Cannon, S.B.; Udvardi, M.K.;
Benedito, V.A.; Mayer, K.F.; Gouzy, J.; Schoof, H.; et al. The Medicago genome provides
insight into the evolution of rhizobial symbioses. Nature 2011, 480, 520–524.
10. Schmutz, J.; Cannon, S.B.; Schlueter, J.; Ma, J.; Mitros, T.; Nelson, W.; Hyten, D.L.;
Song, Q.; Thelen, J.J.; Cheng, J.; et al. Genome sequence of the palaeopolyploid soybean.
Nature 2010, 463, 178–183.
11. Provart NJ, Alonso J, Assmann SM, Bergmann D, Brady SM, Brkljacic J, et al. 50 years of
Arabidopsis research: highlights and future directions. New Phytol. 2016;209:921–44.
12. Lee Jiyoung, Lenwood S. Heath, Ruth Grene and Song Li. Comparing time series transcriptome
data between plants using a network module finding algorithm. Plant Methods, 2019, 15:61-77.
13. Klepikova, A. V. and Aleksey A. Penin. Gene Expression Maps in Plants: Current State and
Prospects. Plants, 2019, 8, 309-326; doi:10.3390/plants8090309.

13

You might also like