Professional Documents
Culture Documents
What is quantifying?
Quantification means that sequencing is used to determine not the
composition of DNA fragments but their abundance.
the approach will intersect the resulting alignment file with the use of
annotations to produce abundances that are then filtered to retain
statistically significant results.
B) Classifying against a transcriptome
When classifying against a transcriptome the input data will be:
the approach will directly produce abundances that are then filtered to
produce statistically significant results.
What is normalization?
When we assign values to the same labels in different samples, it
becomes essential that these values are comparable across the samples.
The process of ensuring that these values are expressed on the same
scale is called normalization.
What is FPKM?
FPKM is an extension of the already flawed concept of RPKM to paired-end
reads. Whereas RPKM refers to reads, FPKM computes the same values
over read pair fragments.
What is TPM?
TPM is where we multiply the above by a million:
TMP = 10^6 N / L * 1 / sum
You have to use padj in all other cases as this adjusted value corrects for
the so-called multiple testing error - it accounts for the many alternatives
and their chances of influencing the results that we see.
Transcriptomics data
All transcriptomic methods require RNA to first be isolated from the
experimental organism before transcripts can be recorded. Although
biological systems are incredibly diverse, RNA extraction techniques are
broadly similar and involve mechanical disruption of cells or tissues,
separation of RNA from undesired biomolecules including DNA, and
concentration of the RNA via precipitation from solution or elution from a
solid matrix.
Microarrays
Microarrays consist of short nucleotide oligomers, known as "probes",
which are typically arrayed in a grid on a glass slide. Transcript
abundance is determined by hybridisation of fluorescently labelled
transcripts to these probes. The fluorescence intensity at each probe
location on the array indicates the transcript abundance for that probe
sequence.
RNA-Seq
RNA-Seq refers to the combination of a high-throughput sequencing
methodology with computational methods to capture and quantify
transcripts present in an RNA extract. The nucleotide sequences
generated are typically around 100 bp in length, but can range from 30
bp to over 10,000 bp depending on the sequencing method used. Both
low-abundance and high-abundance RNAs can be quantified in an RNA-
Seq experiment. RNA-Seq may be used to identify genes within
a genome, or identify which genes are active at a particular point in time
and read counts can be used to accurately model the relative gene
expression level.
RNA-Seq data analysis
RNA-Seq experiments generate a large volume of raw sequence reads
which have to be processed to yield useful information. Data analysis
usually requires a combination of bioinformatics software tools that vary
according to the experimental design and goals. The process can be
broken down into four stages: quality control, alignment, quantification,
and differential expression. Nowadays, most popular RNA-Seq programs
are run from a command-line interface, either in a Unix environment or
within the R/Bioconductor statistical environment.