You are on page 1of 10

Structure of Nucleic Acids Hydrophobic AAs: o Alanine, Valine, Isoleucine, Leucine, Methionine, Phenylalamine, Tyrosine, Tryptophan R groups with

th no polar bonds, except tyrosine and tryptophan Hydrophilic AAs: o Lysine, Arginine, Histidine Basic: Aspartate, Glutamate Polar: Serine, Threonine, Asparagine, Glutamine Special AAs: Cysteine [SH], Glycine [H], Proline Lampbrush chromosome site of RNA synthesis DNA requires all info reqd to build cells+tissues Transcription: Process by which info stored as dsDNA is copied into ssRNA Two types of genes: o Informational: one which encodes RNA for proteins o Noninformational: does not code RNA but does code other things i.e. rRNA Translation: process by which info stored in RNA (through transcription) is used to create polymers of AAs (proteins) Nucleus: where chromosones of DNa are located; also the site where transcription, RNA processing, replication and site at which virus attacks occur Nucleolus: site of synthesis of rRNA Cytoplasm: site of translation and locus of RNA virus attacks; also conatins the ribosomes, amino acids, tRNA and TFs (Transcription Factors) Purines: Adenosine, Guanine Pyrimidines: Uracil, Cytosine, Thymine Phosphodiester linkage links 3 OH group to 5 OH group of two nucleotides RNA has 100 10,000 nucleotides Cellular DNA can contain ~100,000,000 nucleotides 5 end has a free phosphate group (attached to 5 C) 3 end has a free hydroxyl group (attached to 3 C) DNA/RNA are directional (tensors) Typical DNA structure: A DNA (right hand helix) o Other DNA structures: Z DNA (dehydrated B DNA) left hand helix B DNA squished left hand helix In vivo DNA pairing: GC (3 H bonds) and TA (2 H bonds) GU pairs exist quite commonly in higher level structures of RNA during local folding (RNA single stranded) dsDNA is flexible about its long axis o Binding proteins can bend DNA o DNA bending is required for packing into chromatin TATA-box binding protein (TBP): transcription of most Euk. Genes require TBP to bind to their promoter region (TAATAA variation, typically) o During promoter-TBP interaction, the dsDNA is bent at nearly 90o DNA denaturation can be reversed through reversal of initial denature-inducing conds o Conditions under which DNA can be denatured: Raise in Temprature

Reduction of concentration of stabilizing ions Extremes of pH Adding agents which specialize in destabilizing H-bonds (formamide or urea) o For renaturation to occur, both stands must be complimentary (high fidelity!!!) o These de-/renaturation experiments are called nucleic acid hybridization experiments Secondary RNA structure: hairpin, stem loop (local double helical stem regions) Tertiary RNA structure: Pseudoknot (two stem loop structures) During DNA transcription, in order for Poly(II) to be able to use one strand as a template, the DNA must first be denatured (local melting) The incoming substance used by Poly(II) to form the new complimentary strand are rNTPs (ribonucleotide tri phosphates) these rNTPs base-pair with the template stand to create a novel complimentary strand Poly(II) sequentially joins rNTPs from 5 to 3 by forming phosphodiester bonds o The creation of phophodiester bonds from triphosphate linkage occurs because it is lower energy

Transcription Three stages of transcription: o Initiation: Poly(II) binds to promoter sequences and locally denatures DNA and catalyzes the first phosphodiester linkage o Elongation: polymerase advanced 3 5 down the template strand and produces complimentary ssRNA 5 3 o Termination: Poly(II) recognizes a stop codon and releases the complete RNA and dissociates from DNA Transcription initiation: o Poly(II) binds to promoter sequences in dsDNA (start site near 3 end, stop site further along) This complex is denoted as the closed complex Euk RNa Poly(II) requires associated proteins called general transcription factors (gTF) to find promoters and initiate transcription o Poly (II) melts the dsDNA near the start site and forms a bubble which is about 14BP, this allows rNTPs to begin RNA synthesis; open complex o Poly (II) catalyzes phosphodiester linkage of two initial rNTPs Transcription Elongation: o Poly (II) advances 3 5 down the template strand The RNA which is freshly synthesizes is called nascent RNA o Multiple Poly (II) molecules can sequentially transcribe mass numbers of the same gene (typical for environmental condition change) o This elongation complex is hyperstable Poly (II) cannot fall off o Speed of elongation ~1000 Nucti./min o At the stop side, Poly (II) releases RNA and dissociates from the dsDNA o Primary transcript: completed RNA molecule (mRNA not directly synthesizes from DNA) Prokaryotic genome very compact o Genes in Proks are arranged in operons (similar-function groups of genes therefor save time by being transcribed all on a single mRNA o Very few non-coding gaps in Proks o In Proks: mRNA is directly synthesized from dsDNA Five prime capping: As the 5 end of a nascent RNA chain emerges from RNA polymerase, a 5 cap structure (7methyl-G) is added by several associated enzymes Polyadenylation: addition of 100-250 A residues by poly(A) polymerase at the 3 end of mRNA; enzymatically

The RNA chain undergoes intron excision and exon ligation (splicing) o The first exon (at 5 end) with always include a 5 UTR (UnTranslated Region); the last exon will always include a 3 UTR The UTRs do not encode for proteins but they do contain elements that regulate translation of mRNA and ribosome recruitment to the RNA strand Open reading frame part of the RNA strand which encodes for proteins Alternative splicing variability and selection of which exons are excised / ligated o Leads to an extremely diverse range of possible proteins o One example is that liver cells excise exons EIIIB and EIIIA but fibroblasts retain these exons to create extremely different proteins (liver cells (hepatocytes) vs. muscle) o Alternative splicing increases protein diversity in Euks Elements which govern (regulate) transcription could possibly span many kilobases and could even be many kilobases away from the promoter (upstream or downstream) Three types of RNA polymerases: o Poly(I) located in the nucleolus; transcribes precursor rRNA o Poly(II) transcribes mRNAs and four small nuclear RNAs (snRNAs) that take part in splicing o Poly(III) transcribes tRNA, 5s rRNA and other sRNAs Bacterial RNA poly(II) 5 subunits: o Beta o Beta prime o Alpha one o Alpha two o Omega Yeast RNA poly(II) 12 subunits o RPB2,1,3,11,6 and six other enzyme-specific subunits Largest subunit of poly(II) has a carboxy-terminal domain (CTD) not in Proks!!! o In mammals, CTD is made up of 52 identical repeats of Tyr-Ser-Pro-Thr-Ser-Pro-Ser [forms a regid structure] RNA polymerase molecules which initiate transcription has unphosphorilated CTDs RNA polymerase molecules that are actively transcribing have phosphorylated CTDs Transcription initiation requires the formation of a complex called the pre-initiation complex: 1. TBP binds to TATA box 2. TBP is a component of TFIID (in vivo) TFIID has 13 other subunits calls TBP-associated factors 3. TFIIA forms a complex with TFIID and the TATA box 4. At this point, the dsDNA bends sharply 5. TFIIB then binds to both DNA and TBP 6. Extra-complex TFIIF binds to poly(II) which has a CTD in it this subcomplex now has a TFIIF, CTD and poly(II) in it 7. This subcomplex then binds beside the TBP 8. TFIIE binds next 9. TFIIH (with nine subunits) binds next; completing the pre-initiation complex 10. As elongation starts, all factors except TBP are released TBP remains at the promoter sequence to rapidly start another sequence of transcription initiation

Prokaryotic Organization

Operons (Proks) o Half the genes of E.coli are organized into operons o Operons are grouped together by function Lac operon encodes 3 enzymes required for catabolism of lactose ([LACTOSE] is proportional to lac operon encoding rate) Trp operon encodes 5 enzymes for biosynthesis of tryptophan (lac of tryptophan activates trp operon) E.coli o For E.coli RNA polymerases to initiate transcription, it needs a sigma factor Sigma factor protein that recognizes promoter sequence Sigma factor 70 (SF70) is the most common one SF70 binds at the promoter and therefore is where RNA poly binds o Operator sequence is a control element that the protein lac repressor binds to when not bound to lactose represses the lac operon by effectively blocking the start site o Transcription is repressed when lactose is absent lactose is an inducer When lactose is present, it binds to the lac repressor and changes the repressors conformation such that it must release itself from the operator sequence o Low transcription of lactose due to presence of glucose low glucose levels activate lac operon to catabolize lactose o E.coli synthesizes cyclic AMP (cAMP) in response to low glucose levels cAMP binds to and activates a transcriptional protein called CAP CAP then binds to a CAP-site when complexed with cAMP CAP-cAMP interacts with RNA poly and greatly stimulates the rate of transcription Moreover on sigma factors (SFs) o SF54 recognies promoters of genes involved in nitrogen metabolism Genes with SF54 promoters are regulated by enhancers 80-160BP upstream which are activated by NtrC (a DNA binding protein) NtrC is activated by phosphorylation Consensus sequence: where the nucleotides typically appear in TATA-box positions Genes which are transcribed at extremely high levels (strong promoters) have a TATA box starting 35BP upstream of the starting site Some genes have an initiator element which includes a C at the -1 position and an A at the +1 position no good consensus sequence has been globally established Some genes initiate transcription at multiple sites within a 20-200BP region; these genes lack a TATA box or an initiator element but contain CG-bond rich stretches (CpG islands) within ~100BP of the start region Promoter TATA-box or any other sequences that recruits Poly(II) to the transcription start site Promoter-proximal elements (PPEs) other sequences near the promoter which help regulate transcription PPEs can be cell type specific Control elements are identified by a laboratory technique called the linker-scanning overlap mutations which are excellent methods to examine small stretches of DNA In Euks elements which regulate transcription initiation of a gene can span many KBs A laboratory technique called deletion analysis an locate imported series of codons and initiators KBs away

Genomics C-value paradox: genome size does not correlate with biological complexity G-value paradox: number of protein-coding genes does not correlate with biological activity

Things to consider about the above two paradoxes: cis-regulation (?), alternative splicing, redundant genes, multi-functional proteins, post-translational modifications DNA micro arrays thousands of gene-specific DNA sequences attached to a glass slide or gene chip and is used to analyze global patterns of gene expression o More than one array analysis is preferred Sometimes species mutations allows one gene to be copied and then have a slight change added to it in order to accidently allow it to function in different ways (could be good or bad) Types of Genomics o Structural genomics how the locus of genes differs and effects life o Functional genomics the function of each gene is specific processes o Comparative, Evolutionary genomics evolution and genomic phenomena o Nutrigenomics prodict diet regimes for personalized medicine based of genome analysis o Pharmacogenims drugs for personalized medicine based on genome analysis o Synthetic genomics custom organism designed for a purpose e.g. twenty-four hour development of an H1N1 vaccine o Phylogenomics: comparison of genomes of different organisms

Eukaryotic Organization Eukaryotic enhancers can be >50KB away from the genes they regulate o Could be upstream or downstream or even within an intron or in the final exon or a gene o Are cell specific Most Euk genes are regulated by multiple transcriptional control elements o Distinction between PPEs and enhancers is unclear Many yeast genes have a regulatory element called UAS (Upstream activating sequence) Transcriptional control elements (e.g. enhancers) are binding sites for regulatory proteins (transcriptional factors; TFs) o These proteins can be identified by biochemical techniques: DNase I footprinting, electrophotertic mobility shift assay (EMSA or gel shift) Co-transfection assay: cultured cells are used to evaluate whether a protein, encoded by a known gene, is a TF Transcription factors: proteins that stimulate or repress transcription for specific sets of genes; bind to PPEs and enhancers in Euk DNA TFs are modular proteins containing a single DNA binding domain and one or more activation domains (for activators) or repression domains (for repressors) Purification of transcription factors (?) Transcriptional activators (TAs): o Activators are modular proteins with distinct functional domans: DNA binding domain Activation domain o These domains interact with other proteins to stimulate transcription o GAL4 activator (?) o A fused protein is the result of a DNA-binding site from one transcriptional activator being fused with the activation domain of another Transcriptional repressors (TRs): o Functional converse of activators o Most TRs are modular proteins with a DNA-binding domain and a repression domain

o Like activators, TRs interact with other proteins Types of DNA binding domains o Zinc-finger motifs C4 zinc finger found in ~50 human TFs of the nuclear receptor family Generally contain only two units but bind as homodimers (two identical polypeptides bound together) Have two-fold rotational symmetry and bind to consensus DNA sequences that are inverted repeats o Leucine-Zipper Consensus has a leucine residue at even seventh position Binds DNA as a dimer Often are heterodimers Related proteins have a different repeated hydrophobic amino acid Basic zipper bZip is a term used for a larger family of proteins (?) o Basic helix-loop-helix (bHLH) Similar to basic zippers except a nonhelical loop seprates two alpha-helical regions Different bHLH proteins can form heterodimers How can this finite set of TFs generate enough regulatory diversity? o Heterodimers in some heterodimeric transcription factors, each monomer has different DNA-binding specificity increased combinatorial possibilities exponential diversity (3 monomers can make 6 different dimers, 4 can make 10, etc) o Inhibitory factors can block DNA binding by some bip and bHLH monomers o Cooperative binding of unrelated TFs to near the sites (stimulating or blocking, indirectly) Activation domains o Less sequence consensus than DNA binding domains o High frequency of two particular amino acids (Asp, Glu, Gln, Pro, Ser or Thr) o Acidic activation domains (those with Asp or Glu) are activated when bound to a protein co-activator E.g. CREB, RARgamma (?,?) o IRF-3,7 (?, ?)

Post-transcription Overview of post-transcriptional steps of gene expression o 5 Cap added o Cleavage at Poly(A) site o Poly(A) polymerase (PAP) adds 100-250 A residues o RNA splicing removes introns and relegates exons 5 cap 7-methyl-guanosine o Added to 5 (emerging) end of nascent mRNA when it is 25-30 nuclti. Long (occurs even during synthesis) o This process is catalyzed by a dimeric capping enzyme which associate with the CTD of Poly(II) One subunit removes the gamma-phosphate from the 5 endof the RNA Another subunit transfers GMP from GTP (substrate) to the 5 diphosphate of the nascent transcript Then the same dimeric capping enzyme transfers methyl groups from N7 position of the guanine to make guanine guanosine RNA binding domains o Many were discovered in hnRNPs (heavy nuclear ribonucleoproteins)

All are proteins which associate with pre-mRNAs: RNA recognition motif (RRM) most common RNA binding domains RRM: 80 AAs long and folds into a 4-stranded beta-sheet flanked by two alpha-helices containing RNP1 and RNP2 motifs that contact the phosphates of RNA RGG box: contains 5 Arg-Gly-Gly repeats interspread with aromatic AAs (Phe, Tyr, Trp) Unknown detailed structure KH motif: 45 residues, similar structure to RRM domain except that RNA binds by interacting with a hydrophobic surface formed by the alpha-helices and one beta-strand DNA binding domains have common structural motifs (e.g. C2H2, C4 zinc fingers, homeodomains, bHLH, bZIP) which all have one or more alpha-helices that interact with the major groove of DNA at the binding site Regulatory regions of most genes contain binding sites for multiple transcription factors (therefore level of transcription varies depending on particular combinations of TFs) Combinatorial complexity of transcription control results from alternative combinations of monomers that form heterodimeric TFs and cooperative binding of transcription factors to control sites Activation and repression domains exhibit a diverse array of sequences and structures All of the mentioned domains interact with other proteins called co-activators or co-repressors Enhanceosome cooperative binding of multiple activators to nearby sites in an enhancer to form this multiprotein complex o Assembly often requires other small proteins that bind to the minor groove and sharply bend DNA Cleavage and Polyadenylation are tighly coupled processes which both occur in several steps Locus information for cleavage/poly(A) o Poly(A) site site where poly(A) tail will be added after cleavage (~15 nucleotides downstream (closer to 3) o Poly(A) signal = AAUAAA o Another poly(A) signal rich in G/U more towards the 3 Proteins associated with cleavage/poly(A) tail addition: o CPSF = cleavage and polyadenylation specificity factors; RNA binding protein; recognizes AAUAAA seq. o CStF = cleavage stimulatory factor; recognizes G/U rich seq. These two protein enjoy binding to each other and creating a desired kink in the RNA o CFI, II = cleavage factors one and two endonuclease activity; cleave RNA at poly(A) site; G/U is therefore also cut off and not present in mRNA However, the poly(A) signal is included in the mRNA During the protein attachments of cleavage/polyadenylation, the enzyme poly(A) polymerase (PAP) is recruited o PAP only binds when all cleavage factors are involved o PAP binds before cleavage occurs and thereby links the two processes o CFI,II cut the RNA (G/U thereby degraded) and a new 3 OH is generated Slow polyadenylation o G/U released o CStF released o CFI released o CFII released o CPSF stays bound to PAP o PAP breaks down ATP (for E) and adds AMP residues to RNA chain; uses ATP as energy source and pyrophosphate o Addition of the first 12 A residues occurs very slowly

Binding of PABII (poly(A) binding protein II) to poly(A) via its RRM (RNA recognition motif) domain greatly speeds up the process Also known as rapid polyadenylation o PABPII signals to PAP to stop adding A residues after 100-250 [unknown mechanism] EM imaging provides direct evidence for RNA splicing o A segment of DNA was denatured and hybridized into RNA, then was expressed in affected cells o DNA-RNA hybrids were investigated o In Euk viruses, the mRNA is not coded with DNA Several key residues near the site to be spliced are required however, most of the intron seq. is obviously unnecessary for splicing to occur o First nucleotide of an intron is always GU, and the last (3 end) is always AG o Pyrimidine-rich regions of ~15BP (C or U) o Branch point (Adenosine) o Branch point to 3 splice site is typically 20-50 BP Splicing occurs via two consecutive transesterification reactions o Difference between ribose and deoxyribose: ribose has a 2 OH o 1st transesterification 2 OH group of branch point attach the first phosphate of the beginning of intro (at 5) the 3 has become free o 2nd transesterification newly freed 3 OH at end of the exon attacks the 5 phosphate at the site of the 3 splice site leaving the excised lariot intron (with tail that is the distance between branch point and splice site) Splicing is catalyzed by snRNPs (small nuclear ribonucleoprotein particles) o Base pairing between pre-mRNA, U1 snRNA and U2snRNA is essential for splicing o snRNPs have a sRNA component and shitload of more proteins o U1 snRNA short RNA of < 100 nucleotides; at 5 end, has a sequence complimentary to sequence at the 5 splice site; this RNA can base pair with pre-mRNA o U2 snRNA series of nucleotides that can base pair near the branch point, but never with the branch point (A) U1 snRNP binds to sequences of 5 splice site U2 snRNP binds to sequences around branch point Three other snRNPs associate to form the spliceosome: U4, U6, U5 Spliceosome o ~70 proteins, some of which are associated with snRNPs and others, not o Protein U2AF has two subunits: one of which (U2AF65) binds to the pyrimidine-rich region near the 3 end of introns and to the U2snRNP; the other subunit (U2AF35) binds to the AG at the 3 end of the 5 intron; thus U2AF promotes interaction of the U2 snRNP with the branch point o Assembly of the spliceosome involves the mechanistic juxtaposition of the 2 ends of the introns o Release of U1 and U4 makes the spliceosome catalytically active o First transesterification requires U2 and U6 o Second transesterification joins the 5 and 3 exons by a standard 3, 5 phosphodiester bond o Debranching enzyme debranches lariat intron into linear intron RNA and then is degraded into monomers The CTD of RNA poly(II), when unfolded, is very long in comparison with the globular part (it associates with splicing and polyadenylation factors, thus linking these processes with transcription)

Interaction of SR (Ser-Arg rich) proteins with exonic splicing enhancers (ESE) help define ends of exons (true splice sites) they interact on particular sites in exons, or with each other or cooperatively bind to small subunits of U2AF and subcomponents of U1 snRNP Group II introns are now only present in mitochondria and chloroplast genes but are thought to be the evolutionary predecessors of introns o Group II introns are very sensitive to mutations Polyadenylation and splicing occur at the same time

Drosophila Sex determination pathway in Drosophila o Alternative splicing controls sex determination in drosophila o Sex-lethal gene (Sxl gene) is alternatively spliced in both males and females In females, 3 end of exon 2 and 5 end of exon 4 are fused In males, a longer transcript it made o Exon 3 contains a stop codon; thus females make a longer protein which is functional o Sxl protein in females is an RNA-binding protein which regulates alternative splicing: Sxl binds to 3 end of transformer protein TRA o TRA is alternatively spliced, in females it is shorter and makes the functional protein o Transformer protein promotes splicing of another downstream gene, dsx for 3-4 in females o Control of Sxl expression Sxl is under transcriptional control: it is expressed only in females during early embryogenesis Later in development, the female-specific promoter is repressed and a different Sxl promoter is activated that is on in both sexes However, Sxl pre-mRNA is alternatively spliced dependent upon presence of Sxl protein o How Sxl protein regulates splicing: Sxl binds to a sequence near the 3 end of the intron between exons 2 and 3 and blocks associated between U2AF and the U2 snRNP (thus Sxl represses a particular splice site) U1 srRNP binds properly to the 3 end of the exon 2 but assembles into a spliceosome with U2 snRNP bound to the branch point @ the 3 end of the intron between exons 3 and 4; thus exon 2 gets spliced to exon 4 and exon 3 goes out as part of a larger intron o Sex-determining cascade Sxl regulates tra pre-mRNA by the same mechanism Tra regulates splicing of dsx pre-mRNA: only females have tra; it forms a complex with Rbp1 and Tra2 directs splicing of exon 3 to exon 4 and promotes cleavage and polyadenylation at an alternative poly(A) site at the 3 end of exon 4 Males have no tra, exon 4 is skipped; exon 3 is spliced to exon 5 polyadenylation occurs downstream of exon 6 Different forms of dsx (a transcriptional repressor) are produced in male and female embryos Tra/Rbp1/Tra2 activate a particular splice site by binding to exon 4 and recruiting U2AF and U2 snRNp to the 3 end of the intron between exons 3 and 4 splicing regulators can work as activators as well as repressors Extensive alternative splicing leads to production of many isoforms of Slo protein in the nervous system these various isomers can respond to different sound frequencies (in drosophila) RNA editing can also result in the production of alternative protein isoforms e.g. the liver and intestine ApoB proteins changes are made to nucleotide composition of RNA after its been transcribed (CAA at exon 26 to UAA, stop codon)

Nuclear Cytoplasmic Membrane Transport Nuclear pore complex (NPC) site of nucleocytoplasmic transport o Resemble baskets o Highly ordered o This basket has a pore, through which molecules may pass o Cytoplasmic filaments stick out into the cytoplasm o Approx. 125 MDA (30x bigger than ribosomes) o Can accommodate molecules up to ~60kDA on a free-diffusion basis o Larger molecules and multimolecular complexes (i.e. RNPs) must be actively transported through the NPC All nuclear proteins are synthesized in the cytoplasm and imported through NPCs o These proteins contain a nuclear localization signal (NLS) Addition of an NLS to a cytoplasmic protein pyruvate kinase targets it to the nucleus e.g. digitonin (a mild detergent) Importin alpha and beta o Heterondimeric nuclear-import receptors Alpha binds to NLS Beta binds to FG-nucleoporins Nuclear transport factor 2 (NTF2): binds to Ran-GDP and the FG repeats of FG-nucleoporins then returns Ran to the nucleus Mechanism for nuclear import of NLS-containing cargo proteins o Cargo protein binds to importin and forms cargo complex o Importin goes to nuclear pore Ran GDP brought to nucleoporin o When Ran GDP goes to nucleoplasm, nuclear protein GEF (Guanine Exchange Factor) hydrolyses GTP to make GDP and transfers the high energy phosphate to the Ran o This changes the conformation of Ran GDP and no longer allows it to bind NTF2 o Importin releases cargo importin binds to Ran GTP Importin can drag Ran GTP to FG-nucleoporins o GAP protein in cytoplasm hydrolyzes GTP to convert Ran GTP to Ran GDP (which recycles) o This allows transport to be directional and coupled to metabolic energy Nuclear export uses a very similar mechanism o Cargo complex is bound to Ran GTP passes through NPC, GAP acts on Ran GTP entire complex gets dissociated o Ran GDP interacts with exporting 1 and recycled in nucleoplasm (where its hydrolyzed with GEF)

You might also like