You are on page 1of 6

DataAnalystNanodegreeSyllabus

DiscoverInsightsfromData

BeforeYouStart
Prerequisites:ThankyouforyourinterestintheDataAnalystNanodegree!Inordertosucceedinthis
program,werecommendhavingexperienceprograminginPython.Ifyouveneverprogrammedbefore,or
wantarefresher,thereisanIntroductiontoPythonProgrammingintheextracurricularsectionofthe
nanodegreeprogram.

EducationalObjectives:Learntoorganizedata,uncoverpatternsandinsights,makepredictionsusing
machinelearning,andclearlycommunicatecriticalfindings.

LengthofProgram*:260Hours
FrequencyofClasses:Self-paced
Textbooksrequired:None
InstructionalToolsAvailable:Videolectures,1:1appointments,forumsupport

*Thisisaself-pacedprogramandthelengthisanestimationoftotalhourstheaveragestudentmaytaketo
completeallrequiredcoursework,includinglectureandprojecttime.Actualhoursmayvary.

IntroProject:AnalyzeBayAreaBikeShareData(10hrs)
Thisprojectwillintroduceyoutothekeystepsofthedataanalysisprocess.Youlldosobyanalyzingdata
fromabikesharecompanyfoundintheSanFranciscoBayArea.Youllsubmitthisprojectinyourfirst7
days,andbytheendyoullbeableto:

UsebasicPythoncodetocleanadatasetforanalysis
Runcodetocreatevisualizationsfromthewrangleddata
Analyzetrendsshowninthevisualizationsandreportyourconclusions
Determineifthisprogramisagoodfitforyourtimeandtalents

Project:ComputeStatisticsfromCardDraws(20hrs)
Inthisproject,youwilldemonstrateyourknowledgeofdescriptivestatisticsbyconductinganexperiment
dealingwithdrawingfromadeckofplayingcardsandcreatingawrite-upcontainingyourfindings.This
projectisself-graded.

SupportingLessonContent:Statistics

LessonTitle LearningOutcomes

INTROTORESEARCH Identifyseveralstatisticalstudymethodsanddescribethe
METHODS positivesandnegativesofeach

VISUALIZINGDATA Createandinterprethistograms,barcharts,andfrequencyplots

CENTRALTENDENCY Computeandinterpretthe3measuresofcenterfor
distributions:themean,median,andmode

VARIABILITY Quantifythespreadofdatausingtherangeandstandard
deviation
Identifyoutliersindatasetsusingtheinterquartilerange

STANDARDIZING Convertdistributionsintothestandardnormaldistribution
usingtheZ-score
Computeproportionsusingstandardizeddistributions

NORMALDISTRIBUTION Usenormaldistributionstocomputeprobabilities
UsetheZ-tabletolookuptheproportionsofobservations
above,below,orinbetweenvalues

SAMPLING Applytheconceptsofprobabilityandnormalizationtosample
DISTRIBUTIONS datasets

Project:InvestigateaDataset(30hrs)
Inthisproject,youllchooseoneofUdacity'scurateddatasetsandinvestigateitusingNumPyandpandas.
Youllcompletetheentiredataanalysisprocess,startingbyposingaquestionandfinishingbysharingyour
findings.

SupportingLessonContent:IntroductiontoDataAnalysis

LessonTitle LearningOutcomes

DATAANALYSISPROCESS Identifythekeystepsinthedataanalysisprocess
CompleteananalysisofUdacitystudentdatausingpurePython,
withminimalrelianceonadditionallibraries

NUMPYANDPANDAS UseNumPyarrays,pandasseries,andvectorizedoperationsto
FOR1DDATA easethedataanalysisprocess

NUMPYANDPANDAS Usetwo-dimensionalNumPyarraysandpandasDataFrames
FOR2DDATA Understandhowtogroupdataandtocombinedatafrom
multiplefiles

Project:WrangleOpenStreetMapData(60hrs)
Inthisproject,youllusedatamungingtechniques,suchasassessingthequalityofthedataforvalidity,
accuracy,completeness,consistencyanduniformity,tocleantheOpenStreetMapdataforapartofthe
worldthatyoucareabout.

SupportingLessonContent:DataWranglingwithSQL

LessonTitle LearningOutcomes

DATAEXTRACTION Properlyassessthequalityofadataset
FUNDAMENTALS UnderstandhowtoparseCSVfilesandXLSwithXLRD
UseJSONandWebAPIs

DATAINMORECOMPLEX UnderstandXMLdesignprinciples
FORMATS ParseXML&HTML
Scrapewebsitesforrelevantdata

DATAQUALITY Understandcommonsourcesfordirtydata
Measurethequalityofadataset&applyablueprintforcleaning
Properlyauditvalidity,accuracy,completeness,consistency,and
uniformityofadataset

ANALYZINGDATA Identifycommonexamplesoftheaggregationframework

Useaggregationpipelineoperators$match,$project,$unwind,
$group

SQLFORDATAANALYSIS UnderstandhowdataisstructuredinSQL
Runqueriestosummarizedata
Usejoinstocombineinformationacrosstables
Createtablesandimportdatafromcsv

CASESTUDY: Useiterativeparsingforlargedatafiles
OPENSTREETMAPDATA UnderstandXMLelementsinOpenStreetMap

Project:ExploreandSummarizeData(50hrs)
Inthisproject,youlluseRandapplyexploratorydataanalysistechniquestoexploreaselecteddatasetfor
distributions,outliers,andanomalies.

SupportingLessonContent:DataAnalysiswithR

LessonTitle LearningOutcomes

WHATISEDA? Defineandidentifytheimportanceofexploratorydataanalysis
(EDA)

RBASICS InstallRStudioandpackages
WritebasicRscriptstoinspectdatasets

EXPLOREONEVARIABLE Quantifyandvisualizeindividualvariableswithinadataset
Createhistogramsandboxplots
Transformvariables
Examineandidentifytradeoffsinvisualizations

EXPLORETWOVARIABLES Properlyapplyrelevanttechniquesforexploringtherelationship
betweenanytwovariablesinadataset
Createscatterplots
Calculatecorrelations
Investigateconditionalmeans

EXPLOREMANY Reshapedataframesanduseaestheticslikecolorandshapeto
VARIABLES uncoverinformation

DIAMONDSANDPRICE Usepredictivemodelingtodetermineagoodpricefora
PREDICTIONS diamond

Project:TestaPerceptualPhenomenon(20hrs)
Inthisproject,youllusedescriptivestatisticsandastatisticaltesttoanalyzetheStroopeffect,aclassic
resultofexperimentalpsychology.Communicateyourunderstandingofthedataandusestatistical
inferencetodrawaconclusionbasedontheresults.

SupportingLessonContent:InferentialStatistics

LessonTitle LearningOutcomes

ESTIMATION Estimatepopulationparametersfromsamplestatisticsusing
confidenceintervals

HYPOTHESISTESTING Usecriticalvaluestomakedecisionsonwhetherornota
treatmenthaschangedthevalueofapopulationparameter

T-TESTS Testtheeffectofatreatmentorcomparethedifferencein
meansfortwogroupswhenwehavesmallsamplesizes

Project:IdentifyFraudfromEnronEmail(50hrs)
Inthisproject,youllplaydetectiveandputyourmachinelearningskillstousebybuildinganalgorithmto
identifyEnronemployeeswhomayhavecommittedfraudbasedonthepublicEnronfinancialandemail
dataset.

SupportingLessonContent:IntroductiontoMachineLearning

LessonTitle LearningOutcomes

SUPERVISED ImplementtheNaiveBayesalgorithmtoclassifytext
CLASSIFICATION ImplementSupportVectorMachines(SVMs)togeneratenew
featuresindependentlyonthefly
Implementdecisiontreesasalaunchingpointformore
sophisticatedmethodslikerandomforestsandboosting

DATASETSAND WrestletheEnrondatasetintoamachine-learning-readyformat
QUESTIONS inpreparationfordetectingcasesoffraud

REGRESSIONSAND Useregressionalgorithmstomakepredictionsandidentifyand
OUTLIERS cleanoutliersfromadataset

UNSUPERVISED Usethek-meansclusteringalgorithmforpattern-searchingon
LEARNING unlabeleddata

FEATURES,FEATURES, Usefeaturecreationtotakeyourhumanintuitionandchange
FEATURES rawfeaturesintodataacomputercanuse
Usefeatureselectiontoidentifythemostimportantfeaturesof
yourdata
Implementprincipalcomponentanalysis(PCA)foramore
sophisticatedtakeonfeatureselection
Usetoolsforparsinginformationfromtext-typedata

VALIDATIONAND Implementthetrain-testsplitandcross-validationtovalidate
EVALUATION andunderstandmachinelearningresults
Quantifymachinelearningresultsusingprecision,recall,andF1
score

Project:MakeanEffectiveVisualization(20hrs)
Inthisproject,youllcreateadatavisualization,usingTableau,fromadatasetthattellsastoryorhighlights
trendsorpatternsinthedata.Yourworkshouldbeareflectionofthetheoryandpracticeofdata
visualization,harnessingvisualencodingsanddesignprinciplesforeffectivecommunication.

SupportingLessonContent:DataVisualizationwithTableau

LessonTitle LearningOutcomes

DATAVISUALIZATION Understandtheimportanceofdatavisualization
FUNDAMENTALS Knowhowdifferentdatatypesareencodedinvisualizations

DESIGNPRINCIPLES Selectthemosteffectivechartorgraphbasedonthedata
beingdisplayed
Usecolor,shape,size,andotherelementseffectively

CREATING BecomeproficientinbasicTableaufunctionality,including
VISUALIZATIONSWITH charts,filters,hierarchies,etc.
TABLEAU CreatecalculatedfieldsinTableau

TELLINGSTORIESWITH CreateTableaudashboardsandstoriestoeffectively
TABLEAU communicatedata