You are on page 1of 32

DNA Microarray and Data Analysis

Brijesh Singh Yadav


E.Mail:brijeshbioinfo@gmail.com
+91945257130
www.scribd.com/brijeshbioinfo
Introduction
• A DNA microarray is a multiplex technology used in
molecular biology and in medicine. It consists of an arrayed
series of thousands of microscopic spots of DNA
oligonucleotides, called features, each containing picomoles
(10−12 moles) of a specific DNA sequence, known as probes
(or reporters).

• This can be a short section of a gene or other DNA element


that are used to hybridize a cDNA or cRNA sample (called
target) under high-stringency conditions. Probe-target
hybridization is usually detected and quantified by
detection of fluorophore-, silver-, or chemiluminescence-
labeled targets to determine relative abundance of nucleic
acid sequences in the target.
Probe Attachment

• In standard microarrays, the probes are attached


via surface engineering to a solid surface by a
covalent bond to a chemical matrix (via epoxy-
silane, amino-silane, lysine, polyacrylamide or
others). The solid surface can be glass or a silicon
chip, in which case they are colloquially known as
an Affy chip when an Affymetrix chip is used.
Other microarray platforms, such as Illumina, use
microscopic beads, instead of the large solid
support.
Nucleic acid hybridization
• The core principle behind microarrays is
hybridization between two DNA strands, the
property of complementary nucleic acid
sequences to specifically pair with each other by
forming hydrogen bonds between complementary
nucleotide base pairs. A high number of
complementary base pairs in a nucleotide
sequence means tighter non-covalent bonding
between the two strands. After washing off of
non-specific bonding sequences, only strongly
paired strands will remain hybridized.
Hybridization
• AT and GC baseparing

 
• Affected by temperature, pH,
and ion concentration
– Higher temperature  higher
stringency                       
– Lower temperature  more non-
specific binding             
DNA Microarray
Organism-1 Organism-2
E. coli chromosome

PCR Purification of mRNA


Gene A Gene C
Gene B
Labeling during RT
Arrayer
Glass
Slide Gene A Gene B Gene C

(Target)
(Probe)
16 hr, 42C Wash

Microarray
Gene A Gene B Gene C
Scan

Image Analysis: Organism-2 Organisms-1,2 Organism-1


On the surface
Fluorescent tag

T C A G Target

hybridization T C A G

A G T C

A G T C
probe array

In Affymetrix system the meaning


for probe and target are reversed
Comparison
• GeneChip: expensive, high density,
absolute value measurement, fixed
design
• cDNA microarray: cheap, low
density, relative value measurement,
free design
• Oligoarray: cheap, low density,
relative value measurement, free
design
cDNA Spotted Microarrays
DNA microarrays can be used to measure
changes in expression levels, to detect
single nucleotide polymorphisms (SNPs) ,
to genotype or resequence mutant
genomes (see uses and types section).
Microarrays also differ in fabrication,
workings, accuracy, efficiency, and cost
(see fabrication section). Additional factors
for microarray experiments are the
experimental design and the methods of
analyzing the data
Data Acquisition
• Scan the arrays
• Quantitate each spot
• Subtract background
• Normalize
• Export a table of fluorescent
intensities for each gene in the
array
Basic Data Analysis
• Fold change (relative increase or
decrease in intensity for each gene)
• Set cutoff filter for low values
(background +noise)
• Cluster genes by similar changes -
only really meaningful across
multiple treatments or time points
• Cluster samples by similar gene
expression profiles
Scatter plot of all genes in a
simple comparison of two
control (A) and two
treatments (B: high vs. low
glucose) showing changes in
expression greater than 2.2
and 3 fold.
Cluster
by color
difference
Microarry Data Variablity
• Microarray data are inherently highly
variable - you are measuring mRNA
levels
• Any measurement of thousands of values
will find some large differences due to
chance (normal distribution)
• Must have replication and statistics to
show that differences are real
• Use REAL replicas (different patients,
different experiments), don’t just split
samples.
Sources of Variability
• Image analysis (identifying and
quantitating each spot on the array)
• Scanning (laser and detector, chemistry
of the flourescent label))
• Hybridization (temperature, time,
mixing, etc.)
• Probe labeling
• RNA extraction
• Biological variability
Normalization
• Can control for many of the experimental
sources of variability (systematic, not
random or gene specific)
• Bring each image to the same average
brightness
• Can use simple math or fancy -
– divide by the mean (whole chip or by sectors)
– LOESS (locally weighted regression)
• No sure biological standards
Normalization
• Normalization in the same experiment
due to the efficiency differences of
fluorescent protein (cy3, cy5) by using
house-keeping gene expression
• Global normalization for different
experiments by using total expression
or by using certain external chemicals
for every experiment
Are the Treatments
Different?
• Analysis of microarray data has tended to focus
on making lists of genes that are up or down
regulated between treatments
• Before making these lists, ask the question:
"Are the treatments different?"
• Use standard statistical methods to evaluate
expression profiles for each treatment (t-test or
f-test)
• If there are differences, find the genes most
responsible
• If there are not significant overall differences,
then lists of genes with large fold changes may
only reflect random variability.
Sample Variability
• Use paired samples - normal & cancer
or before & after treatment from the
same patient, 6 & 24 hours from
same cell culture
• What is the variability of two samples
from the same patient
– any two surgical samples have different
amounts of various cell types
– different day,different environmental and
metabolic factors
Multiple Comparisons
• In a microarray experiment, each
gene (each probe or probe set) is
really a separate experiment

• Yet if you treat each gene as an


independent comparison, you will
always find some with significant
differences
– (the tails of a normal distribution)
False Discovery
• Statisticians call false positives a "type 1 error"
or a "False Discovery"
• False Discovey Rate (FDR) is equal to the p-
value of the t-test X the number of genes in the
array
– For a p-value of 0.01 X 10,000 genes
= 100 false “different” genes
– You cannot eliminate false positives, but by
choosing a more stringent p-value, you can
keep them manageable (try p=0.001)
• The FDR must be smaller than the number of
real differences that you find - which in turn
depends on the size of the differences and
varability of the measured expression values
Gene-Specific Variability
• Different probes will hybridize to mRNAs with
different efficiency
– microarrays can only measure relative
change of expression, not absolute levels
• Cross-hybridization
– Gene families
– Chance similarity of short oligo sequence
• Affy mis-match >> perfect match for many
probes
• Different Affy probes for the same gene show
huge differences in hybridization intensity
• Alternative splicing!!
Statistics
• When you have variability in
measurements, you need replication
and statistics to find real differences
• It’s not just the genes with 2 fold
increase, but those with a significant p-
value across replicates
• Non-parametric (i.e. rank) or paired
value statistics may be more
appropriate
Experimental Design
• Real replicates!
(same treatment, same biological source,
different RNA prep, labeling,
hybridization, and scanning)
• Dye reversal for two color hybs.
• Block design (don’t do exp. on one
day and control on another)
• Work with a Statistician!!
Higher Level
Microarray data analysis

• Clustering and pattern detection


• Data mining and visualization
• Controls and normalization of results
• Statistical validatation
• Linkage between gene expression data and
gene sequence/function/metabolic pathways
databases
• Discovery of common sequences in co-
regulated genes
• Meta-studies using data from multiple
experiments
Types of Clustering
• Herarchical
– Link similar genes, build up to a tree of
all
• Self Organizing Maps (SOM)
– Split all genes into similar sub-groups
– Finds its own groups (machine learning)
• Principle Component and SVD
– every gene is a dimension (vector), find
a single dimension that best represents
the differences in the data
Public Databases
• Gene Expression data is an essential
aspect of annotating the genome
• Publication and data exchange for
microarray experiments
• Data mining/Meta-studies
• Common data format - XML
• MIAME (Minimal Information About a
Microarray Experiment)
Send your suggetion to me at my email-brijeshbioinfo@gmail.com

Whole matter of this power point to access form internet.

You might also like