Genomics Lectures 9 To 14-2023 PDF

Genome Analysis
1000 Genomes: A Deep Catalog of Human Genetic Variation

1000 Genomes Project
 The 1000 Genomes Project is an international research consortium
formed to create the most detailed and medically useful picture to date of
human genetic variation.
 The project involves sequencing the genomes of approximately 1200

people from around the world and receives major support from the
Wellcome Trust Sanger Institute in Hinxton, England, the Beijing Genomics
Institute Shenzhen in China and the National Human Genome Research
Institute (NHGRI), part of the National Institutes of Health (NIH).
 Drawing on the expertise of multidisciplinary research teams, the 1000

Genomes Project will develop a new map of the human genome that will
provide a view of biomedically relevant DNA variations at a resolution
unmatched by current resources.
 As with other major human genome reference projects, data from the
1000 Genomes Project will be made swiftly available to the worldwide
scientific community through freely accessible public databases.
1000 Genomes Project Strategy
 The goal of the 1000 Genomes Project is to find most genetic variants that have
frequencies of at least 1% in the populations studied. This goal can be attained by
sequencing many individuals lightly.
 To sequence a person's genome, many copies of the DNA are broken into short
pieces and each piece is sequenced. The many copies of DNA mean that the DNA
pieces are more-or-less randomly distributed across the genome. The pieces are
then aligned to the reference sequence and joined together.
 To find the complete genomic sequence of one person with current sequencing
platforms requires sequencing that person's DNA the equivalent of about 28 times
(called 28X). If the amount of sequence done is only an average of once across the
genome (1X), then much of the sequence will be missed, because some genomic
locations will be covered by several pieces while others will have none.
 The deeper the sequencing coverage, the more of the genome will be covered at
least once. Also, people are diploid; the deeper the sequencing coverage, the more
likely that both chromosomes at a location will be included. In addition, deeper
coverage is particularly useful for detecting structural variants, and allows
sequencing errors to be corrected.
1000 Genomes Project Strategy
 Sequencing is still too expensive to deeply sequence the many samples being
studied for this project.
 However, any particular region of the genome generally contains a limited

number of haplotypes.
 Data can be combined across many samples to allow efficient detection of most
of the variants in a region.
 The Project currently plans to sequence each sample to about 4X coverage; at

this depth sequencing cannot provide the complete genotype of each sample, but
should allow the detection of most variants with frequencies as low as 1%.
 Combining the data from 2500 samples should allow highly accurate estimation
(imputation) of the variants and genotypes for each sample that were not seen
directly by the light sequencing.
1000 Genomes Project Timelines
 January 22, 2008: International Consortium Announces the 1000 Genomes

Project
 June 21, 2010: 1000 Genomes Project Releases Data from Pilot Projects on Path
to Providing Database for 2,500 Human Genomes (2500 unidentified people from
about 25 populations around the world)
December 16, 2010: Sequencing of 629 individuals completed
October 12, 2011: An integrated set of variant calls and phased genotypes
including SNPS, short INDELs and Deletions based on low coverage and exome
sequencing data across 1092 individuals.
For details see:

http://www.internationalgenome.org/
The 1000 Genomes Project Publications
The main publications from the 1000 Genomes Project are the final
publications from phase 3 of the project, which were published
in Nature in October 2015.
“A global reference for human genetic variation” Nature 526 68-74
2015
“An integrated map of structural variation in 2,504 human
genomes” Nature 526 75-81 2015
The Consortium also produced publications from the earlier data

phases of the project, which were the initial pilot and phase 1 of the
main project. No equivalent paper was produced for phase 2, which
focused on technical development work.
“An integrated map of genetic variation from 1,092 human
genomes” Nature 491 56-65 2012
“A map of human genome variation from population-scale

sequencing” Nature 467 1061-1073 2010
GenomeAsia100K Consortium
https://genomeasia100k.org/
Nature. 2019 Dec;576(7785):106-111. doi: 10.1038/s41586-019-1793-z.

Epub 2019 Dec 4.
The GenomeAsia 100K Project enables genetic discoveries across
Asia.
Abstract
The underrepresentation of non-Europeans in human genetic studies so
far has limited the diversity of individuals in genomic datasets and led to
reduced medical relevance for a large proportion of the world's
population. Population-specific reference genome datasets as well as
genome-wide association studies in diverse populations are needed to
address this issue. Here we describe the pilot phase of the GenomeAsia
100K Project. This includes a whole-genome sequencing reference
dataset from 1,739 individuals of 219 population groups and 64 countries
across Asia. We catalogue genetic variation, population structure,
disease associations and founder effects. We also explore the use of this
dataset in imputation, to facilitate genetic studies in populations across
Asia and worldwide.
Genome Analysis Tools
Two Broad Genomics Research Areas
Functional Genomics Study Techniques
How to measure the pattern of gene expression in a given

tissue over a period of time?
1. Northern Blot
2. In situ hybridization
3. RT-PCR
4. Gene Chip/ DNA Microarray

RT– PCR (Real-time Reverse Transcription
PCR)
• Used for amplifying a defined piece of mRNA molecule.
• Traditionally RT-PCR involves two steps:
(i) RT reaction
(ii) PCR amplification.
RNA is first reverse transcribed into cDNA (complementary

DNA)
So, first step of RT PCR is:
• Isolation of mRNA from the cell
• Next, make cDNA from the mRNA
• This is reversing “transcription”– so use an

enzyme originally obtained from viruses–
REVERSE TRANSCRIPTASE
RT efficiency :
Random hexamer primers > poly-dT primer > gene-specific primers
2nd step of RT PCR:
• The resulting cDNA is used as templates

for subsequent PCR amplification using
primers for one or more genes.
• RT – PCR can also be carried out as one

step RT – PCR in which all reaction
components are mixed in one tube prior to
starting reaction (would require hot-start
Taq)
RT– PCR
• Application:
– allows for a high sensitivity detection technique, where
low copy number or less abundant mRNA molecules
can be detected. Used in gene expression studies.
Real Time PCR
• Real time PCR was developed because of the need to quantitate differences in
mRNA expression.
• Conventional PCR does not yield truly quantitative data because of the difficulties of
observing the reaction during the truly linear part of the amplification process.
• Particularly valuable when amounts of RNA are low ( e.g. SMALL AMOUNTS OF
TISSUE; PRIMARY CELLS)
• Syber Green is a dye which binds to double stranded DNA but not to single-stranded
DNA and is frequently used to monitor the synthesis of DNA during real-time PCR
reactions.
Real Time PCR
• kinetic approach
• early stages
• while still linear
9
www.biorad.com
3.
intensifier 5. ccd
detector
1. halogen
350,000
tungsten lamp 2b. emission pixels
filters
2a. excitation
filters
4. sample plate
www.biorad.com
Real Time PCR
So, how to measure differences in

concentration of DNA or cDNA?
This graph shows a series of 10-fold dilutions
of a sample.
As one dilutes the sample, it takes more

cycles before the amplification is detectable.
Samples which differed by a factor of 2

would expect to be 1 cycle apart.
Samples that differ by 10-fold would be ~3.3

cycles apart.
Note: If the plateau values are

4000 to 15000, a threshold of 300
usually works well.
Same data plotted on Logarithmic scale. It is

easy to get the Ct values from this plot.
Relative Expression= 2^(- ΔΔCt )
Condition Mouse Gene A Actin ΔCt ΔΔCt Rel Expression

How do you generate accurate q-PCR data?
(i) Good quality RNA
(ii) No genomic DNA contamination – DNase I treatment and primer
designing strategy, no-reverse transcriptase control
(iii) Ensuring non-specific amplification- gel electrophoresis
Semi –quantitative RT-PCR

How do you generate accurate q-PCR data?
More than one internal control is better
Real-time PCR was carried out using the DyNAmoTM HS SYBR® Green qPCR Kit
(Finnzymes, USA) and following Hmgcr gene specific primers. For normalization of Hmgcr
expression, GAPDH and 18S rRNA abundances were measured using the following primer
pairs. The relative gene expression levels were determined by calculating the 2(-ΔΔCt) values.
Sonawane et al. 2011. Functional Promoter Polymorphisms Govern Differential Expression of HMG-
CoA Reductase Gene in Mouse Models of Essential Hypertension PLoS ONE. 6(1): e16661.
doi:10.1371/journal.pone.0016661.
GeneChip Technology
Miniaturized, high density arrays
Expression arrays 1,300,000 DNA oligos 1-cm by 1-cm

DNA mapping array 7,000,000 DNA oligo 1.3 by 1.3 cm
Manufacturing Process
Solid-phase chemical synthesis and Photolithographic fabrication

techniques employed in semiconductor industry
DNA Microarrays
Photolithographic Synthesis
Manufacturing Process
Probe arrays are manufactured by light-directed chemical
synthesis process which enables the synthesis of hundreds of
thousands of discrete compounds in precise locations
Lamp
Mask Chip
Computer algorithms are used to design photolithographic masks for use in

manufacturing
Affymetrix Wafer and Chip Format
20 - 50 µm
20 - 50 µm
one
oligonucleotide
sequence per “pixel”
49 - 400
chips/wafer
1.0 cm
up to ~ 1.3 million features/chip
Selection of Expression Probes
3’
Sequence
Probes
• Set of oligos to be synthesized is defined from sense sequence of known

genes and EST’s
•Each gene is represented on the probe array by multiple probe pairs
•Each probe pair consists of a perfect match and a mismatch oligonucleotide

Overview: Creating Targets
mRNA
Reverse Transcriptase
cDNA
in vitro transcription
cRNA
Fragmentation of cRNA
GeneChip Hybridization
mRNA
cRNA
Fragmentation of biotinylated cRNA
Fragmentation -Metal mediated alkali induced hydrolysis
Hybridization and Staining
Array
RNA:DNA Hybridized Array
Fragmented cRNA Target
Streptavidin phycoerythrin
[Fluorescent dye]
Instrumentation
Affymetrix GeneChip System
3000-7G Scanner
450 Fluidic Station
640 Hybridization Oven
Currently Available GeneChips
Expression Arrays
B. subtilis Plasmodium Genome Array
Barley Genome Array Porcine Genome Array
Bovine Genome Array Rat Genome Arrays
C. elegans Genome Array Rice Genome Array
Canine Genome Array Soybean Genome Array
Chicken Genome Array Sugar Cane Genome Array
Drosophila Genome Arrays Vitis vinifera (Grape) Array
E. coli Genom e Arrays Wheat Genome Array
Human Genome Arrays Xenopus laevis Genome Array
Maize Genome Array Yeast Genome Arrays
Mouse Genome Arrays Zebrafish Genome Array
P. aeruginosa Genome Array Arabidopsis Genome Arrays
Hybridization of fluorescently labeled cDNA preparations to DNA microarrays
This technique is useful for analyzing gene expression patterns on a genomic scale
Data Analysis
Absolute Analysis –whether transcripts are

Present or not (uses data from one probe array
experiment).
Comparison Analysis –determine the relative

change in transcripts (uses data from two probe
array experiments).
Intensities for each experiment are compared to a

baseline/control.
Validation of Gene Chip data
Genomics Tools
Phage Display
This is a very powerful genomics tool to discover interaction of a protein

with an immobilized target (e.g. a likely disease susceptibility molecule:
an enzyme, a receptor)
cDNA library
• Accurate and complete representation of all mRNA

sequences expressed in a cell, tissue, or organism.
• Facilitates analysis of sequences when only interested in

mRNA and the protein it encodes.
• Since many eukaryotic genes have introns, analysis of the

mRNA simplifies deciphering the coding regions of a
gene.
• Protein encoded and its function can now be predicted

with some accuracy without knowing what you have cloned
based on sequence analysis of cDNA.
How to clone cDNA:
• cDNA has blunt ends, thus need to add restriction site
linkers to make them “sticky”.
• Use T4 DNA ligase and blunt end ligation to add restriction

site linkers to each end of the cDNA.
• Next, digest the linkers with the same restriction enzyme

used to cleave the vector.
• Mix cDNA with cut vector DNA in the presence of DNA

ligase.
• If cDNA has the same restriction site as the linkers,

cDNA will be cloned in pieces. Solution: use adapters with
single-stranded overhangs that match the restriction site
on the vector.
Linker and adapter
Cloning of cDNA using

BamHI linkers
Alternative: use
adapter
5’-GATCCAGAC-3’
GTCTG-5’
Cloning cDNA in Bacteriophages
Plaque formation on a lawn of bacterial cells
Amplification of Phage Libraries
• Primary library, which consists of individual phage

recombinant particles.
• If the sequence of interest have not been found, more

recombinant DNA will have to be produced and
packaged – Amplification.
• Amplification of the library is achieved by plating the

packaged phage on a suitable E.coli strain (e.g. BLT
5616), and then resuspend the plaques by gently
washing the plated by a buffer solution. The resulting
phage suspension can be stored almost indefinitely and
will provide enough material for many screening and
isolation procedures.
Amplification of Libraries
Disadvantages
1. Some recombinant phage may be lost – Perhaps due to
the presence of repetitive sequences in the insert giving
rise to recombinational instability. This can be minimized
by plating on a recombination deficient host.
2. Some phage may exhibit differential growth

characteristics which may cause a particular phage to be
over-expressed or under-expressed in the amplified
library and this may mean greater number of plaques
have to be screened to isolate the desired gene.
Phage Display
Phage Display
Phage Display
Phage Display
Phage Display
Phage Display
Phage Display
Bound phage is
eluted and amplified.
Single plaques are
isolated, followed
by PCR amplification
of phage DNA
Identification
Biopanning
Phage Display
8: Identification
Sequencing of the phage

DNA identifies the
Candidate protein/ peptide

Genomics Lectures 9 To 14-2023 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Genomics Lectures 9 To 14-2023 PDF

Uploaded by

Copyright:

Available Formats

Genome Analysis

1000 Genomes: A Deep Catalog of Human Genetic Variation

 The project involves sequencing the genomes of approximately 1200

 Drawing on the expertise of multidisciplinary research teams, the 1000

 However, any particular region of the genome generally contains a limited

 The Project currently plans to sequence each sample to about 4X coverage; at

 January 22, 2008: International Consortium Announces the 1000 Genomes

December 16, 2010: Sequencing of 629 individuals completed

For details see:

The Consortium also produced publications from the earlier data

“A map of human genome variation from population-scale

Nature. 2019 Dec;576(7785):106-111. doi: 10.1038/s41586-019-1793-z.

How to measure the pattern of gene expression in a given

4. Gene Chip/ DNA Microarray

• Traditionally RT-PCR involves two steps:

RNA is first reverse transcribed into cDNA (complementary

• Next, make cDNA from the mRNA

• This is reversing “transcription”– so use an

• The resulting cDNA is used as templates

• RT – PCR can also be carried out as one

So, how to measure differences in

As one dilutes the sample, it takes more

Samples which differed by a factor of 2

Samples that differ by 10-fold would be ~3.3

Note: If the plateau values are

Same data plotted on Logarithmic scale. It is

Condition Mouse Gene A Actin ΔCt ΔΔCt Rel Expression

(i) Good quality RNA

(ii) No genomic DNA contamination – DNase I treatment and primer

designing strategy, no-reverse transcriptase control

(iii) Ensuring non-specific amplification- gel electrophoresis

Semi –quantitative RT-PCR

Miniaturized, high density arrays

Expression arrays 1,300,000 DNA oligos 1-cm by 1-cm

Solid-phase chemical synthesis and Photolithographic fabrication

Computer algorithms are used to design photolithographic masks for use in

• Set of oligos to be synthesized is defined from sense sequence of known

•Each gene is represented on the probe array by multiple probe pairs

•Each probe pair consists of a perfect match and a mismatch oligonucleotide

RNA:DNA Hybridized Array

Fragmented cRNA Target

Absolute Analysis –whether transcripts are

Comparison Analysis –determine the relative

Intensities for each experiment are compared to a

This is a very powerful genomics tool to discover interaction of a protein

• Accurate and complete representation of all mRNA

• Facilitates analysis of sequences when only interested in

• Since many eukaryotic genes have introns, analysis of the

• Protein encoded and its function can now be predicted

• Use T4 DNA ligase and blunt end ligation to add restriction

• Next, digest the linkers with the same restriction enzyme

• Mix cDNA with cut vector DNA in the presence of DNA

• If cDNA has the same restriction site as the linkers,

Cloning of cDNA using

• Primary library, which consists of individual phage

• If the sequence of interest have not been found, more

• Amplification of the library is achieved by plating the

2. Some phage may exhibit differential growth

Sequencing of the phage

You might also like