This action might not be possible to undo. Are you sure you want to continue?
Viral genomes: ssRNA, dsRNA, ssDNA, dsDNA, linear or ciruclar Viruses with RNA genomes: •Almost all plant viruses and some bacterial and animal viruses •Genomes are rather small (a few thousand nucleotides) Viruses with DNA genomes (e.g. lambda = 48,502 bp): •Often a circular genome. Replicative form of viral genomes •all ssRNA viruses produce dsRNA molecules •many linear DNA molecules become circular Molecular weight and contour length: • duplex length per nucleotide = 3.4 Å • Mol. Weight per base pair = ~ 660
• Generally 1 circular chromosome (dsDNA) • Usually without introns • Relatively high gene density (~2500 genes per mm of E. coli DNA) • Contour length of E.coli genome: 1.7 mm • Often indigenous plasmids are present
Extra chromosomal circular DNAs
• • • • • • • • • •
Found in bacteria, yeast and other fungi Size varies form ~ 3,000 bp to 100,000 bp. foreign gene Replicate autonomously (origin of replication) May contain resistance genes May be transferred from one bacterium to another May be transferred across kingdoms Multicopy plasmids (~ up to 400 plasmids/per cell) Low copy plasmids (1 –2 copies per cell) Plasmids may be incompatible with each other Are used as vectors that could carry a foreign gene of interest (e.g. insulin)
000 copies • LINEs (long interspersed elements) – 1-5 kb – 10-10.000 copies .Eukaryotic genome • Moderately repetitive – Functional (protein coding. tRNA coding) – Unknown function • SINEs (short interspersed elements) – 200-300 bp – 100.
Eukaryotic genome • Highly repetitive – Minisatellites • Repeats of 14-500 bp • 1-5 kb long • Scattered throughout genome – Microsatellites • Repeats up to 13 bp • 100s of kb long.000 at ends of chromosomes . 106 copies • Around centromere – Telomeres • Short repeats (6 bp) • 250-1.
Eucaryotic genomes • Located on several chromosomes • Relatively low gene density (50 genes per mm of DNA in humans) • Contour length of DNA from a single human cell = 2 meters • Approximately 1011 cells = total length 2 x 1011 km • Distance between sun and earth (1.5 x 108 km) • Human chromosomes vary in length over a 25 fold range • Carry organelles genome as well .
• Often A+T rich genomes.500 kb in plants Over 95% of mitochondrial proteins are encoded in the nuclear genome. • Mt DNA is replicated before or during mitosis .Mitochondrial genome (mtDNA) • • • • Multiple identical circular chromosomes Size ~15 Kb in animals Size ~ 200 kb to 2.
Chloroplast genome (cpDNA) • • • • Multiple circular molecules Size ranges from 120 kb to 160 kb Similar to mtDNA Many chloroplast proteins are encoded in the nucleus (separate signal sequence) .
“Cellular” Genomes Viruses Procaryotes Eucaryotes Nucleus Capsid Plasmids Bacterial chromosome Chromosomes (Nuclear genome) Mitochondrial genome Viral genome Chloroplast genome Genome: all of an organism‟s genes plus intergenic DNA Intergenic DNA = DNA between genes .
Estimated genome sizes mammals plants fungi bacteria (>100) mitochondria (~ 100) viruses (1024) 1e1 1e2 1e3 1e4 1e5 1e6 1e7 1e8 1e9 1e10 1e11 1e12 Size in nucleotides. Number in ( ) = completely sequenced genomes .
172 x 106 4. thaliana D. sapiens 0. cerevisiae C. melanogaster H.6 x 106 12. coli S.1 x 106 95.Size of genomes Epstein-Barr virus E. elegans A.5 x 106 117 x 106 180 x 106 3200 x 106 .
Chromosome organization Eucaryotic chromosome Telomere Centromere p-arm Centromere: • DNA sequence that serve as an attachment for protein during mitosis.4 n = 20 to 100. help stabilize the chromosome • In yeast telomeres are ~ 100 bp long (imperfect repeats) • Repeats are added by a specific telomerase 5’ – (TxGy)n 3’ – (AxCy)n x and y = 1 . • In higher eucaryotes centromers are much longer and contain “satellite DNA” Telomere q-arm Telomeres: • At the end of chromosomes. • In yeast these sequences (~ 130 nts) are very A+T rich. (1500 in mammals) .
Gene classification intergenic coding genes Chromosome (simplified) region non-coding genes Messenger RNA Structural RNA Proteins transfer RNA Structural proteins Enzymes ribosomal RNA other RNA .
.• What is a gene ? Definitions 1. Classical definition: Portion of a DNA that determines a single character (phenotype) 2. One gene – one protein: “One gene contains information for one protein (structural proteins included) one gene – one polypeptide 4. Current definition: A piece of DNA (or in some cases RNA) that contains the primary sequence to produce a functional biological gene product (RNA. One gene – one enzyme (Beadle & Tatum 1940): “Every gene encodes the information for one enzyme” 3. protein).
Coding region Nucleotides (open reading frame) encoding the amino acid sequence of a protein The molecular definition of gene includes more than just the coding region .
Noncoding regions • Regulatory regions – RNA polymerase binding site – Transcription factor binding sites • Introns • Polyadenylation [poly(A)] sites .
Gene Molecular definition: Entire nucleic acid sequence necessary for the synthesis of a functional polypeptide (protein chain) or functional RNA .
Anatomy of a gene • ORF. TATA box). . TAG) • Upstream region with binding site. TAA. • Poly-a „tail‟ • Splices.g. From start (ATG) to stop (TGA. Bounded by AG and GT splice signals. (e.
transcribed as a single polycistronic mRNA.Bacterial genes • Most do not have introns • Many are organized in operons: contiguous genes. that encode proteins with related functions Polycistronic mRNA encodes several proteins .
Bacterial operon What would be the effect of a mutation in the control region (a) compared to a mutation in a structural gene (b)? .
Collagen has 50 introns.Eucaryotic genes Hemoglobin beta subunit gene Exon 1 Intron A Exon 2 90 bp 131 bp 222 bp Intron B 851 bp Exon 3 126 bp Splicing Introns: intervening sequences within a gene that are not translated into a protein sequence. Exons: sequences within a gene that encode protein sequences Splicing: Removal of introns from the mRNA molecule. .
Regulatory mechanisms • „organize expression of genes‟ (function calls) • Promoter region (binding site). usually near coding region • Binding can block (inhibit) expression • Computational challenges – Identify binding sites – Correlate sequence to expression .
Eukaryotic genes • Most have introns • Produce monocistronic mRNA: only one encoded protein • Large .
Alternative splicing • Splicing is the removal of introns • mRNA from some genes can be spliced into two or more different mRNAs .
“Nonfunctional” DNA 80 kb • Higher eukaryotes have a lot of noncoding DNA • Some has no known structural or regulatory function (no genes) .
Types of eukaryotic DNA .
Duplicated genes • Encode closely related (homologous) proteins • Clustered together in genome • Formed by duplication of an ancestral gene followed by mutation Five functional genes and two pseudogenes .
or reverse transcription (and integration) • Not expressed due to mutations that produce a stop codon (nonsense or frameshift) or prevent mRNA processing. or due to lack of regulatory sequences .Pseudogenes • Nonfunctional copies of genes • Formed by duplication of ancestral gene.
Repetitive DNA • Moderately repeated DNA – Tandemly repeated rRNA. tRNA and histone genes (gene products needed in high amounts) – Large duplicated gene families – Mobile DNA • Simple-sequence DNA – Tandemly repeated short sequences – Found in centromeres and telomeres (and others) – Used in DNA fingerprinting to identify individuals .
g.3’ 3’-GTACACGACTTCCGAGTCGTGTAGCTGC.3’ 3’-GTACACGACTTCCGATACACGACGCTGC.Types of DNA repeats Perfect repeats vs degenerate repeats Tandem repeats (e.5’ • Form stem-loop structures Palindroms = adjacent inverted repeats (e.g. satellite DNA) 5’-CATGTGCTGAAGGCTATGTGCTGCGACG. restriction sites) • Form hairpin structures Loop Stem Hairpin .5’ Inverted repeats (e. in transposons) 5’-CATGTGCTGAAGGCTCAGCACATCGACG.g.
of Repeats > 1 Mill > 1000 Size < 10 bp ~ 150 .~300 bp Percent of genome 10 % 20 % .Repetitive sequences DNA Satellite Chromosomal DNA Caesium chloride density gradient Repeats in the mouse genome Type Highly repetitive Moderately repetitive No.
DNA repeats and forensics Gender determination 1) Standard technique: PCR amplification of the amelogenin locus (Males = XY => 103 + 109 bp) 2) AluSTXa Alu insertion on X 3) AluSTYa Alu insertion on Y AluSTXa M F Suspect 878 bp 556 bp AluSTYa X-Y homologous regions AluSTYa X Y Alu sequence M F Suspect 528 bp 199 bp .
Mobile DNA • Move within genomes • Most of moderately repeated DNA sequences found throughout higher eukaryotic genomes – L1 LINE is ~5% of human DNA (~50.000 copies) – Alu is ~5% of human DNA (>500.000 copies) • Some encode enzymes that catalyze movement .
Transposition • Movement of mobile DNA • Involves copying of mobile DNA element and insertion into new site in genome .
and exon shuffling . which provides the fuel for evolution.Why? • Molecular parasite: “selfish DNA” • Probably have significant effect on evolution by facilitating gene duplication.
RNA or DNA intermediate • Transposon moves using DNA intermediate • Retrotransposon moves using RNA intermediate .
Types of mobile DNA elements .
LTR (long terminal repeat) • Flank viral retrotransposons and retroviruses • Contain regulatory sequences Transcription start site and poly (A) site .
LINES and SINES • Non-viral retro-transposons – RNA intermediate – Lack LTR • LINES (long interspersed elements) – ~6000 to 7000 base pairs – L1 LINE (~5% of human DNA) – Encode enzymes that catalyze movement • SINES (short interspersed elements) – ~300 base pairs – Alu (~5% of human DNA) .
Proteins • • • • • Most protein sequences (today) are inferred What‟s wrong with this? Proteins (and nucleic acids) are modified „mature‟ Rna Computational challenges – Identify (possible) aspects of molecular life cycle – Identify protein-protein and protein-nucleic acid interactions .
10-30 consecutive copies.Genetic variation • Variable number tandem repeats (minisatellites). • Single nucleotide polymorphisms . 2-5 bp. Forensic applications. 10-100 bp. • Short tandem repeat polymorphisms (microsatellites).
• Types – Silent – Truncating – Shifting • Significance: much of individual variation.Single nucleotide polymorphisms • 1/2000 bp. • Challenge: correlation to disease .
Published 1997. Regulatory elements. • Lateral transfers.6 x 106 bp. • 4. Transposons. One chromosome.285 protein-coding genes • 122 structural RNA genes • Repeats.Yeast genome • 4. .
68 1.55 3.28 4.24 2.24 5. repair Transcription Translation Enzymes Unknown 45 182 87 281 146 115 55 182 251 1632 1.40 2.85 38.06 .03 6.Yeast protein functions Regulatory Cell structure Transposons.etc Transport & binding Putative transport Replication.05% 4.
This action might not be possible to undo. Are you sure you want to continue?