You are on page 1of 12

nerdy-notes.

com
uploaded by user Tashy Class: Lecture/Exam: School: Semester: Professor: BIO 312 Final Exam SBU Fall 2013 --

BIO312Fall2013FinalReview DNAsegmentsisresponsibletomakefunctionalproducts[YSS1] [YSS2] ProteinsandfunctionalRNAmolecues(RNAi,rRNA,snRNA,tRNA) Approximately20,000humanproteincodinggenes(knowngenes) 9exonpergene(122bp) 27kbgenelength 360introns 1.4kbORFlength 3transcriptspergenome RepeatsjunkDNA*transposableelements Pseudogeneslookalikegenes Prokaryotegenemodels Small,onetranscript,nointons,oneORFpergenes(startstop) EukaryoteGenemodels OnlymatureRNAcontainORF,prestartandprestopareUTR ExpressedSequencetagsrepresentsequencesfromexpressedgenes RegionmatchesESTwithhighstringencythentheregionisageneorpseudogene 3majorcategoriesofinformationusedingenefindingprograms signalsandfeaturessequencepatternthathassignificance(splicedonor,acceptorsites, startandstopcodon) content/compostitionstatisticalpropertiestocodingvsnoncodingregions similaritiescompareDNAsequencetothedatabase Lookforproteincodinggenes LookforORF(startstopcodon)=60100aa Basalsignals Regulatorysignals OpenReadingFramesanystretchofDNAthatpotentiallyencodesaprotein FirstsignalthatasegmentofDNAmaybepartofafunctionalgene Canoverlap LongestORFaremostlikelytobeproteincodingregion LOWGCcontent Phylogeneticrelationship Twolineagesaremorecloselyrelatedtoeachotherthanotherlineageistheyshareamore recentcommonancestor Homologsgenecopiesthatshareacommonancestor Paralogsgenecopiesthatariseasaresultofgeneduplication Orthologsgenecopiesthatariseasaresultofspeciation

identitysequencealignmentthatreferstowhentwoalignedresiduesareidentical(# matches/total) percentidentifypercentageofresiduethatidenticalbetweenalignedsequence Gappositioninonesequencewherenoresidueisshownanthealignmentproposedhasnot deletionorinsertioninthesequence Gappenaltyanumericalpenaltyforintroducingagapbetweenthesequencewithrespecttothe secondsequenceinordertoalignresidues Gapopeningpenaltypenaltyforstartingagapinasequnces Gapextensionpenaltyapenaltywhenaddinggapstoanexistinggapsothatmorethanone residuearenotalignedorarealignedtogaps RULE:alwaysinsertgapsinproteincodingsequencesingroupsof3 FASTAFORMAT:>Nameofsequence1 ATTCGTTA Genbankopenaccessannotatedcollectionofallavailablenucleotidesequencesandprotein translation DatabasethatcontainseveryDNAsequence Mostbiologicaldatabase CDSpropertiesofthecodingsequencesthatcodesfortheprotein ProteintranslatedmRNAsequences

NCBIdatabasecollaboration GOannotationisastatementthatageneproducthasaparticularmolecularfunction,involvedin aparticularbiologicalprocessandislocatedwithinacellularcomponent Majorsequencinglimitations Quantityofthenucleicacids Lengthofthesequencesinasinglereaction Primersequence HierarchicalsequencingalignlargeBACorP1clonesandthenfragmentandsequencea subsetoftheclones Topdownsequencing o Constructmapsoforderinglandmarksprovesalongrangemapandorganizesintoindividual chromosomes o Physicalmapsofoverlappingclonesanchoredtothelandmarkmaps o Tilepath o Shotgunsequencingandassembly Cloningvectors Shotgunsequencingfragmentandsequencetheentiregenome Bottomupsequencing o Shotgunsequencingofshortinsertclones o Pairendsequencingoflargeinsertclones o Assemblyofseedcontigs o Incorporationofothersequencesandintegrationoflongrangedata MaxamGilbertmethodsequenceisendlabeledwithphosphateandchemicallycleavedtoleave asignaturepatternofbands SangermethodDNAsequenceisannealedtoaoligonucleotideprimerwhichisthenextended byaDNApolymerase DNAtobesequencedactsasatemplatefortheenzymaticsynthesisofnewDNastrand startingatadefinedprimer IncorportationofdideoxynucleotideblocksfurthersynthesisofthenewDNAstrand Decreasingthetemperatureallowstheprimertosticktotheintendedlocation DNApolymerasestartselongatingtheprimer MostofthetimewhenaTisrequiredtomakeanewstand,theenzymewillgetagoodoneand continuestoelongate Thealignmentscoreoftwosequencesistheloglikelihoodratioofthealignmentunderancestry ofbychance Log2(Pij(ancestry)/(2*Pij(chance)) NeedlemanWunschalignment Startwith0andthenaddthegappenalityandnumbertherowandcolumnofthematrix (sequencetwoisontopandsequenceoneisontheleft) Diagonalscorethatthatscoreandifitisamatchaddthematchscoreandifitsa mismatch,addthemismatchscore Lefttakethatscoreandaddthegappenalty Toptakethatscoreandaddthegappenalty

Thentakethehighestscorefromthesethreevalues Thescorearethebottomrightisalsoknowastheoptimalalignment Tracebackbydrawingarrowsfromtheoptimalalignmentboxtowhereyoucantanymore butdeterminingwherethescoreinyourboxcamefrom o Ifthearrowisdiagonalthenalignthetwocharacters o Ifthearrowishorizontalthenthesecondsequencehasagap o Ifthearrowisverticalthenthefirstalignmenthasagap Secondarystructureismorestraightforwardandhasmoreenergy Adeninetouracilhastwohydrogenbonds CytosinetoGuaninehasthreehydrogenbond(morestable) BondsformbetweencanonicalbasepairsGisawobblebasepairsoitcanpairwithU WhatisRNAsecondarystructure/folding?

1.hairpinloop 2.multibranchloop 3.helices 4.bulgeloop 5.internalloop 4possiblewaystogetthebestpossiblestructure unpairedbaseI(leftside) Unpairedbasej(rightside) Addingi+jpair Bifurcationcombiningoptimalsubstructures NussinovJacobsonalgorithm 0forthefirsttwodiagonals diagonaldowntake0thenadd1iftheymatch lefttakethevaluefromtheleft takethevaluefromthebottomtoo oryoucantakethevaluefrom1stcellfromthemaindiagonalacrossandthen2down andthenyouaddthem o (4,7)=andk=5(4,5)+((5+1),7)

4<k<7 Traceback o Checkwherethevaluesarethesameorifthebasepairmatches o Everydiag=match o Nucleotidesstartfrom0andthendrawlinesfromtheoptimalbasepairs o Thenlabeltheotherpairsasloops o Row>columndidthe2/1step Traceback BuildingaPhylogeny(Matrix) Countthenumberofdifferencesbetweeneachofthecombinations Pairupthecombinationswiththeleastnumberofdifferences Takethemeanaveragebetweenthepairs Continueuntilyouhaveatree Calculatethemeanbranchlengthfromthetreedistancebetweeneachspecies foundonthematrixdividedby2 Rprogramming Myobject=blahblahdistinguishesthatasmyobject c=sequences=c(1:10)orseq(1,10,1)sequences110by1 rep(x,length=10)repx10times co2=c() o min(co2)minimum o max(co2)maximum o mean(co2)mean o median(co2)median o var(co2)variance o sd(co2)standarddeviation o range(co2)range o quantile(co2)percentage length(co2)nyears=length(co2)years=seq(fron1959,length=nyears co2=data.frame(years,co2)data id<c(,,,) age<c(,,,) sex<c(,,,) o dat.df<data.frame(id,age,sex) o head(data.df,n=2) ,=notincludingarow dat.df[,2,drop=f] extracting(dat.df[c(1,4),] <Lessthan >Greaterthan <=Lessthanorequalto >=Greaterthanorequalto ==Equalto

!=Notequalto &And(elementwise) &&(untilcriteriamet) |Or ||(untilcriteriamet) !Not NorthernBlotRNA WesternBlotprotein SouthernBlotDNA UnknownmixtureoffragmentedDNAisrunonagelandtransferredtonitropaper KnownDNAgeneislabeled Knownsolutionisappliedtomembrane PCRfulllengthgenes EasternBlotposttranslationalmodification Causesofdiseases Enviornmentalcorrelationvscausation Genetic o Causualityisclear o Finitenumberofpolymorphisms Microarraysmethodtogenotypepolymorphisms Hapmapprojectknowpolymorphism ParsimonyCharcteroptimizationminimizethenumberofstepsonatree ORparallelism2separateorigina Homoplasticalternativeequallyparsimoniousoptimization Acctanacceleratedtransformation/Deltrandelayedtransformation Longbranchesareattractedbutsimilaritiesarehomoplastic Parsimonyisinconsistent Branchlengthsarenotthesame o Lessthan10taxa Branchandbounddiscardingfamilyoftreesduringtreeconstruction18taxa Heuristics o #ofpossibletreesincreasesexponentiallywiththenumberoftaxa searchtreespacebybuildingorselectinganinitialtreeandswappingbranches NearestNeighborinterchangetwospeciescaninterchangeandnotdisruptthetree Subtreepruningandregraftingpartofthetreeisremovedandplacesontheotherside TreebisectionandreconnectioncuthalfthetreeandplaceswhereGwas Majorityruleconsensusmethodagreementacrossamajorityofthefundamentaltree Relationshipsthatarenotthemostparsimoniousinterpretationofthedata Includeallandonlyfullsplitsthatarefoundinthemajorityoffundamentaloftress BayesianPhylogeneticanalysisallsitesareinformative Maximumlikelihoodestimatemaximizetheprobabilityofdata

Distancebasedmethodfindatreesuchthatbranchlengthofpathsbetweensequencesfita matrixofpairwisedistancesbetweensequence Blosumsubmatricesblockssubstitutionmatrix Sequencesaboveathresholdsimilaritiesareclustes 62% rangeofevolutionaryperiods conservedblocks conserveddomains Pansubmatrices%acceptedmutations(ONLYMUTATIONS)acceptedinnaturalselection Morethan85%identicalminimizeambiguityandnumberofmutations Extrapolationofsmallevolutionaryperiod Trackevolutionaryorigins Homologoussequencesduringevolution PAM1/BLOSUM100smallevolutionarydistances,strongsimilaritiesforshortsequence PAM250/BLOSUM30largeevolutionarydistance,weaksimilarityoverstretchedlengths SequencealignmentArrangeDNAorproteinsequencestoidregionsofsimilaritythatwe hypothesizeareaconsequencesofevolutionaryrelationship. Basedonmatch,mismatchanggaps Parsimonyphylogenetictreethatexplainstheobservedsequenceswithminimalnumberof substitutionalongbranchesandnucleotidesattheancestralnodes Thealgorithmproducesaultrametrictreedistancesfromtheroottoleafisthesame UPGMAconstantmolecularclock:allspeciesaccumulatemutationsatthesamerate Buildsarootedtree Polynomialtree Addativeifthereexistsatreestrongmolecularclockhypothess Disadvantagefailswhentressdonotfollowthemolecularclock Distanceallcharactersnotjustshared,noancestralstatesandtransitions0 Molecularclockchangeisconstantwithtime Genomewideassociationstudiesallelesthatarelinkedtopositivelociandfollowthealleles Commonbetweenpopulationsthathavebeenassayedmaynothavemanyallesthatare common Markerloci0doesntcauseapositivesnp Contingencytablealleleinheritedatthethemarkercanbethealleleinthedisease=linkedto eachotherlinkagedisequilibrium o Allelesarecoinherited o Alterationthatbecuawstheyarelinkedtogetherthatthereforeonecouldusethelinkedmarker locustohelpidentifywhetherornotyouhavethediseae Polymorphismarethebasisforgeneticcomponentofdiseaseriskvariability Geneticcomponentoftenexplainsmorethanhalfofdisease Lookforgeneticdifferenceofpeoplewithorwithoutthedisease

o Microarraygenotypepolymorphism o Hapmapprojectcommonpolymorphism Identifywhichregionisassociatedwiththediseaseorphenotype o Identifypopulationstructure o Selectcasesubject o Selectcontrol o Genotypeaminionsnp o Determinewhichsnpisassociated Chisquaretestsnpthatdeviatefromtheindependenceassumption RANKSNPBYPVALUES Oddsratioratiooftwoodds===odds(a)/odds(b)(pr(A)/total)/(pr(b)/total) PredictdiseaseriskyesGWASexplainonlyafewpercentofthegeneticcomponent PolymophismcausedNo,GWASusesSNPsinlinkagedisequilibriumwithcasual polymorphism Whatgenesareinvolve Understandingdiseasebiology Candidategenesarealwaysguessed Almostneveranyevidenceimplicatingaparticulargene Transcriptionalenhancerscanactatalongdistance Associationbetweenphenotypeandpolymorphism GWAS+EQTLimplicatediseasegenesinandunbiasedway SplicingeQTLrevealmanymoreGWASoverlapsthanwholetranscriptQTL=importantrolein autoimmunedisease DNAmethylationandhistonemodificationhelptocompartmentalizethegenomeintodomainsof differenttranscriptase Euchromatinhighhistoneacetylation,lowDNAmethylation,H3K4methylation Heterochromatinlowhistoneacetylation,denseDNAmethylation,H3K9methylation SomesegementsofDNAareregulatory Regulatingmoleculescalledtranscriptionfactorybingregulatoryregionandeitherincreasethe rateoftranscriptionordecreaseit ProkaryotesinhibitorbindstotheregulatoryregionandblockstheRNApolymerasefrombinding tothegenepromoter ActivatorbindtoregulatoryregionandrecruiteRNApolymerasetothepromotor Eukaryotesregulatoryregionarefarremovedfromthecodingpartofthegene EQTLexpressionstateofaspecificSNPconnectagenewiththediseasewhatgenethe potentiallyform Unbiasedsystematicmethodtoinferdiseasegenes SNPXaffectsdiseaseZanditseffectongeneyYisadiseasegene Genenumbersdifferbyafactorof5 InherenterrorinreplicationandrecombinationprocessofDNAistheduplicationanddeletionof spansofDNA Varyfromacoupleofnucleotidestoentiregenesorclusters

Twolineagesaremorecloselyrelatedtoeachotheriftheyshareamorecommonancestor Homoplasyresultofindependentevolution,nothomologous Differentchemicalandphysicalpropertiesareduetosidechains CatalyticRNAribosomalandtransfer RNAmediatedgeneregulationMicroandshortinterfering AlternativesplicingsmallnuclearRNA Secondaryismorestraightforwardthantertiaryprediction DuringevolutionsecondarystructureoffunctionalRNAbetterthanprimary Alignsetofphylogeneticallyorderedhomologoussequences Energiesinvolvedinsecondarystructuresaregreaterthantertiary=morestable Comparativesequenceanalysis DuringevolutionsecondarystructureoffunctionalRNAconservedbetterthanprimary Alignsetofphylogeneticallyorderedhomologoussequences Invarianceofcertainsectionidentifiesthenasbeingimportanttostructureandfunction Dynamicprogramming Recursivecomupation Focusofalgorithmbynussinov Structurerepresentation Secondarystructuredescribedasagraph Basepariesaredescribed=indicatinglinksbetweenbasevertices Basesconstrains Eachedgecontainverice Guiltbyassociationassumptionthatgeneswiththesamepatternofchangesinexpressionare involvedthesamepathway Tumorclassificationpredictoutcome/prescribeappropriatetreatmentbasedonclusteringwith knownoutcometumrore GeneexpressionmicroarrwayhybridizecDNA Ratiois1forunchangedexpression,01fordownregulatedgene, RNAsequencing Characterizenoveltranscriptsandslicingvariantstoprfiletheexpressionlevelofknown transcripts Higherresolutionthanwholegenomearrayanalysis Applysameexperimentalprotocoltovariouspurposes o DetectingSNP o Mappingexon o Detectinggenefusion Nextgenerationsequencingchallengingmicroarraysastoolofchoiceforgenome IRF5atranscriptionfactorthatactsdownstreamofTolllikereceptors Type1diabetesSNP=Ctypelectininvolvedinimmuneresponse RhumatoidarthritisTRAF1andC5 TRAF1isdownstreameffectorofTNFR1/2 Modelorganisms

Small,easytomaintain Shortlifecycle Largeprogeny Wellstudieslifecycle