Professional Documents
Culture Documents
As with other major human genome reference projects, data from the
1000 Genomes Project will be made swiftly available to the worldwide
scientific community through freely accessible public databases.
1000 Genomes Project Strategy
The goal of the 1000 Genomes Project is to find most genetic variants that have
frequencies of at least 1% in the populations studied. This goal can be attained by
sequencing many individuals lightly.
To sequence a person's genome, many copies of the DNA are broken into short
pieces and each piece is sequenced. The many copies of DNA mean that the DNA
pieces are more-or-less randomly distributed across the genome. The pieces are
then aligned to the reference sequence and joined together.
To find the complete genomic sequence of one person with current sequencing
platforms requires sequencing that person's DNA the equivalent of about 28 times
(called 28X). If the amount of sequence done is only an average of once across the
genome (1X), then much of the sequence will be missed, because some genomic
locations will be covered by several pieces while others will have none.
The deeper the sequencing coverage, the more of the genome will be covered at
least once. Also, people are diploid; the deeper the sequencing coverage, the more
likely that both chromosomes at a location will be included. In addition, deeper
coverage is particularly useful for detecting structural variants, and allows
sequencing errors to be corrected.
1000 Genomes Project Strategy
Sequencing is still too expensive to deeply sequence the many samples being
studied for this project.
Data can be combined across many samples to allow efficient detection of most
of the variants in a region.
Combining the data from 2500 samples should allow highly accurate estimation
(imputation) of the variants and genotypes for each sample that were not seen
directly by the light sequencing.
1000 Genomes Project Timelines
June 21, 2010: 1000 Genomes Project Releases Data from Pilot Projects on Path
to Providing Database for 2,500 Human Genomes (2500 unidentified people from
about 25 populations around the world)
October 12, 2011: An integrated set of variant calls and phased genotypes
including SNPS, short INDELs and Deletions based on low coverage and exome
sequencing data across 1092 individuals.
The main publications from the 1000 Genomes Project are the final
publications from phase 3 of the project, which were published
in Nature in October 2015.
“A global reference for human genetic variation” Nature 526 68-74
2015
“An integrated map of structural variation in 2,504 human
genomes” Nature 526 75-81 2015
https://genomeasia100k.org/
Abstract
The underrepresentation of non-Europeans in human genetic studies so
far has limited the diversity of individuals in genomic datasets and led to
reduced medical relevance for a large proportion of the world's
population. Population-specific reference genome datasets as well as
genome-wide association studies in diverse populations are needed to
address this issue. Here we describe the pilot phase of the GenomeAsia
100K Project. This includes a whole-genome sequencing reference
dataset from 1,739 individuals of 219 population groups and 64 countries
across Asia. We catalogue genetic variation, population structure,
disease associations and founder effects. We also explore the use of this
dataset in imputation, to facilitate genetic studies in populations across
Asia and worldwide.
Genome Analysis Tools
Two Broad Genomics Research Areas
Functional Genomics Study Techniques
1. Northern Blot
2. In situ hybridization
3. RT-PCR
(i) RT reaction
(ii) PCR amplification.
RT efficiency :
Random hexamer primers > poly-dT primer > gene-specific primers
2nd step of RT PCR:
• Application:
– allows for a high sensitivity detection technique, where
low copy number or less abundant mRNA molecules
can be detected. Used in gene expression studies.
Real Time PCR
• Real time PCR was developed because of the need to quantitate differences in
mRNA expression.
• Conventional PCR does not yield truly quantitative data because of the difficulties of
observing the reaction during the truly linear part of the amplification process.
• Particularly valuable when amounts of RNA are low ( e.g. SMALL AMOUNTS OF
TISSUE; PRIMARY CELLS)
• Syber Green is a dye which binds to double stranded DNA but not to single-stranded
DNA and is frequently used to monitor the synthesis of DNA during real-time PCR
reactions.
Real Time PCR
• kinetic approach
• early stages
• while still linear
9
www.biorad.com
3.
intensifier 5. ccd
detector
1. halogen
350,000
tungsten lamp 2b. emission pixels
filters
2a. excitation
filters
4. sample plate
www.biorad.com
Real Time PCR
Real-time PCR was carried out using the DyNAmoTM HS SYBR® Green qPCR Kit
(Finnzymes, USA) and following Hmgcr gene specific primers. For normalization of Hmgcr
expression, GAPDH and 18S rRNA abundances were measured using the following primer
pairs. The relative gene expression levels were determined by calculating the 2(-ΔΔCt) values.
Sonawane et al. 2011. Functional Promoter Polymorphisms Govern Differential Expression of HMG-
CoA Reductase Gene in Mouse Models of Essential Hypertension PLoS ONE. 6(1): e16661.
doi:10.1371/journal.pone.0016661.
GeneChip Technology
Manufacturing Process
Manufacturing Process
Probe arrays are manufactured by light-directed chemical
synthesis process which enables the synthesis of hundreds of
thousands of discrete compounds in precise locations
Lamp
Mask Chip
20 - 50 µm
20 - 50 µm
one
oligonucleotide
sequence per “pixel”
49 - 400
chips/wafer
1.0 cm
up to ~ 1.3 million features/chip
Selection of Expression Probes
3’
Sequence
Probes
cDNA
in vitro transcription
cRNA
Fragmentation of cRNA
GeneChip Hybridization
mRNA
cRNA
Fragmentation of biotinylated cRNA
Fragmentation -Metal mediated alkali induced hydrolysis
Hybridization and Staining
Array
Streptavidin phycoerythrin
[Fluorescent dye]
Instrumentation
Affymetrix GeneChip System
3000-7G Scanner
450 Fluidic Station
640 Hybridization Oven
Currently Available GeneChips
Expression Arrays
B. subtilis Plasmodium Genome Array
Barley Genome Array Porcine Genome Array
Bovine Genome Array Rat Genome Arrays
C. elegans Genome Array Rice Genome Array
Canine Genome Array Soybean Genome Array
Chicken Genome Array Sugar Cane Genome Array
Drosophila Genome Arrays Vitis vinifera (Grape) Array
E. coli Genom e Arrays Wheat Genome Array
Human Genome Arrays Xenopus laevis Genome Array
Maize Genome Array Yeast Genome Arrays
Mouse Genome Arrays Zebrafish Genome Array
P. aeruginosa Genome Array Arabidopsis Genome Arrays
Hybridization of fluorescently labeled cDNA preparations to DNA microarrays
This technique is useful for analyzing gene expression patterns on a genomic scale
Data Analysis
Alternative: use
adapter
5’-GATCCAGAC-3’
GTCTG-5’
Cloning cDNA in Bacteriophages
Plaque formation on a lawn of bacterial cells
Amplification of Phage Libraries
Disadvantages
1. Some recombinant phage may be lost – Perhaps due to
the presence of repetitive sequences in the insert giving
rise to recombinational instability. This can be minimized
by plating on a recombination deficient host.
Bound phage is
eluted and amplified.
Single plaques are
isolated, followed
by PCR amplification
of phage DNA
Identification
Biopanning
Phage Display
8: Identification