Professional Documents
Culture Documents
Table of Contents
Introduction __________________________________________________________ 3
TheRoadmapProject ______________________________________________________ 3
CampusFunctionalNeeds___________________________________________________ 3
ScopeoftheArchitecture ___________________________________________________ 4
ArchitectureRequirements _________________________________________________ 4
ComponentsoftheArchitecture _____________________________________________ 4
BusinessIntelligence:ApplicationsArchitecture __________________________ 5
Reporting. ________________________________________________________________ 6
Dashboardsandalerts. _____________________________________________________ 7
Reportingintegratedwithproductionapplications. ___________________________ 7
Queryandanalysis. ________________________________________________________ 7
Advancedapplications:modeling,forecastingandplanning. ___________________ 8
Summaryofdifferencesfromcurrentarchitecture. ____________________________ 9
DataArchitecture _____________________________________________________ 9
Contents:thewarehousebusandthelogicaldataarchitecture. _________________ 9
Implementation:databasesandthephysicaldataarchitecture. _______________ 12
Sourcingstrategy _________________________________________________________ 14
Metadataarchitecture_____________________________________________________ 14
Summaryofdifferencesfromcurrentdataarchitecture _______________________ 15
SecurityArchitecture__________________________________________________ 16
Summaryofdifferencesfromcurrentarchitecture ___________________________ 17
TechnologyArchitecture ______________________________________________ 18
DWdeliveryenvironment _________________________________________________ 19
Databasetechnology ______________________________________________________ 19
Summaryofdifferencesfromcurrentarchitecture ___________________________ 19
SupportArchitecture _________________________________________________ 20
Skillsandorganization ____________________________________________________ 20
Processes ________________________________________________________________ 29
Summaryofdifferencesfromcurrentarchitecture ___________________________ 31
Appendices __________________________________________________________ 32
UCBerkeleyFact/DimensionDataMatrix ___________________________________ 32
ArchitectureRequirementsGapAnalysis ____________________________________ 32
August 30, 2006
Page 2
UCBerkeleyDataWarehouseRoadmap
DataWarehouseArchitecture
Introduction
TheRoadmapProject
Inthesummerandfall,2005,theBerkeleyCampusoftheUniversityofCaliforniaundertooka
projecttodevelopaplanforexpandingitsexistingreportingsystemsintoanenterprisedata
warehouse.ThisinitiativeoriginatedasaprojectofthecampusDataStewardshipCouncil.
ManymembersoftheCouncil,aswellasothercampusleaders,feltthereweremany
opportunitiesforthecampustorealizelargebenefitsbyreusinginformationcollectedby
operationalprocessesinordertosupportbetterdecisionmakingandtoimproveprocesses.
TheroadmapprojectwascharteredbyexecutivesponsorWilliamWebster,ViceChancellor
BusinessandFinance.ThecampusCIO,ShelWaggener,wasacosponsor.Theproject
consistedoftwophases.
ThefirstphasewasaninterviewbasedstudywhichfocusedonclarifyingtheneedforanEDW
andidentifyingspecificopportunitiesforanEDWtocontributetoimportantcampusgoals.The
objectofthisphasewastohelpthecampusexecutivemanagementunderstandthebusiness
caseforinvestinginanEDWandtoagreeonprioritiesforguidinganincrementaldevelopment
expectedtooccuroverseveralyears.Thisbusinessstudyphasewasledbyfunctionalmanagers
andstaffedwithanalystsfromseveralcampusunitsaswellasIS&T.
ThesecondphasefocusedondevelopinganarchitecturetodelivertheEDWspecifiedinthe
businessrequirementsstudy.Thearchitecturewasdevelopedbyacrossfunctionalteam
includingfunctionalbusinessanalystsandtechnicalpersonnel.Itwasledbythecampusdata
architect.Thisdocumentsummarizesthearchitecturedevelopedbythatteam.
CampusFunctionalNeeds
Duringthestudyphase,theanalysisteammetwithacrosssectionofthecampusleadership,
includingbothacademicandadministrativefunctions.TheDataStewardshipCounciloversaw
selectionoftheinterviewparticipants.Thoseincludedthreedifferentkindsofinformants:
1. Campusexecutiveleadership,includingtwovicechancellorsandaviceprovost.
2. Campusoperatingmanagement,includingfacultyandotherinstructors,researchers,
budgetmanagersandleadersofadministrativesupportunits.
3. Representativedataproviderspersonnelinvolvedinmanaginganddeliveringcampus
informationsystems.
Theinterviewscoveredtheprincipaloperatingchallengesofthecampusandhelpedidentify
specificopportunitiesforwellorganizedinformationfromoperationstocontributetomanaging
thosechallenges.Thescopeofthestudyincludedtheprimaryfunctionsofteaching,research
andpublicservice,plusthesupportfunctionswhichdirectlysupportthoseprimaryfunctions.
Interviewswithdataprovidershelpedestablishwhetherrequiredinformationwasavailableand
ofsufficientquality.Resultingconclusionswerereviewedwithinformants,otherfunctional
expertsandtheDataStewardshipCouncil.
August 30, 2006
Page 3
Thestudyidentifiedelevenhighprioritybusinessopportunities,coveringareasasdiverseas
courseplanning,purchasingandfundraising.Inaddition,thestudysummarizedimportant
criteriacriticaltothesuccessofanEDWonthecampus.TheseresultswereforwardedtoVice
ChancellorWebsterforuseinexecutiveplanningoftheenterprisedatawarehouse.Working
withthecampusCIO,ShelWaggener,VCWebsterformedanexecutivegrouptoconsider
priorities,fundingandgovernancefortheEDW.
TheresultsofthearchitecturewerepublishedtothecampuscommunityinDecember,2005.A
copyofthefinalreportcanbefoundontheEDWprojectwebsite.
ScopeoftheArchitecture
Thearchitectureoftheenterprisedatawarehouseisdesignedtodelivertheanalysiscapabilities
definedinthebusinessrequirementsdocumentjustreferencedandlikewisetoprovidethe
criticalsuccesselementsdefinedthere.Inaddition,thewarehouseisexpectedtodeliverany
additionalanalysiscapabilitiesdeliveredbyexistingcampusdecisionsupportsystemswhich
werenotexplicitlydocumentedinthebusinessrequirementsdocument.Thesesystemsinclude
thefollowing:
BAIRS
BIS
CalProfiles
Thepilotstudentdatawarehouse
FASDI
TheOfficeofStudentResearchdatabase/reportingfile.
ArchitectureRequirements
Basedonthebusinessrequirementsandsuccessfactorsestablishedinthebusiness
requirementsstudy,thearchitectureteaminvestigatedanddocumentedthemorespecific
architecturalrequirementswhichwouldgoverndevelopmentofthearchitecture.These
includedidentifyingpotentialusers,definingsecurityrequirements,skillsrequirements,etc.
Thosearchitecturalrequirementsaresummarizedandreportedinadocumentwhichcanbe
foundontheEDWprojectwebsite.
Basedondocumentedfunctionalbusinessrequirementsandderivedarchitecturerequirements,
theteamworkedwithpersonnelknowledgeableabouttheexistingdatawarehousetoidentify
importantgapswhichneededtobeaddressedinthearchitecture.Thatgapanalysisis
summarizedinadocumentincludedasanappendixtothisdocument.
ComponentsoftheArchitecture
Thoughitseasytothinkofthedatawarehouseasjustabigcollectionofdata,infactdelivering
aneffectivedatawarehouserequiresalargesetofrelatedcapabilities.(seeFigure1).
Certainlydataisthefundamentalcomponent:cleaned,organizeddata,mostlyextractedfrom
thecampusoperationalsystems.Makingthatdatausefultoavarietyofcampuspersonnel,
though,requiressomeapplicationstodeliverandexplainit.Theseapplicationsrangefrom
predefinedreportsthroughquerytoolstocomplextoolsforanalysisandmodeling.Delivering
dataandapplicationsandsecuringthedataasspecifiedbycampusdatastewardsrequiresaset
oftechnology,mostofitcentralizedinsecurecomputerlocations.Equallyimportant,
transformingoperationaldataintoasharedresourceusefulacrosstheboundariesoffunctional
August 30, 2006
Page 4
businessdomainsrequiresabroadsetoffunctionalskills,organizedappropriatelyandworking
throughprovenprocesses.
Thearchitectureforthedatawarehouseisdescribedintermsoffourinterrelateddimensions:
1. Applications(orthebusinessintelligencelayer).
2. Data.
3. Technologyandsecurity.
4. Supportprocessesandorganization.
Figure1:There'smoretothedatawarehousethanjustdata
BusinessIntelligence:ApplicationsArchitecture
Fortheinformationinthedatawarehousetobevaluable,itneeds to be delivered in way
thatmakesitusefultocampuspersonnelindoingtheirjobs.Thisisthejobofbusiness
intelligenceapplications.Formostpeople,theseapplicationsarethedatawarehouse.Theyare
thesoftwaresystemswhichhelpusersunderstandwhathashappened,identifyproblemsand
opportunities,andtomakeandevaluateplans.Thewarehouseincludesavarietyofthesetools
becausethereareavarietyofuserswithquitedifferentneedsandskills.
Everybodyoncampususesinformationtodotheirjobs,buttheydiffergreatlyinhowwellthey
understandavailableinformationanditsinterconnectionsandalsoinhowcomfortabletheyare
inusinginformationtohelpthemmakedecisions.Thedatawarehouseapplicationsforma
toolboxwithtoolsappropriateforthespectrumofcampuspersonnelfromdeansto
departmentaladministrativeassistants,fromstudentstoanalysts,fromresearchersto
executives.
ThefollowingdiagramsummarizestheEDWapplications:
August 30, 2006
Page 5
Thearchitectureprovidesthreekindsofapplications:
1. Reporting,orinformationdelivery,includingvariantssuchasdashboardsandalerts.
2. Query.
3. Modeling,PlanningandForecasting.
Reporting.
Reportingapplicationsdeliverinformationinaformwhichisusefultousers.Intheirsimplest
formfixed,printedreportsthesetoolsareasoldascomputers.Theyimplyapartnership
betweenuserswhoneedinformationandspecialistswhohelpdesignreports,displaysand
graphicstodelivertherequiredinformation.Thebestreportsprovidejusttheinformationa
userneedsforaspecificpurpose,deliveredinawaywhichmakestheinformationusableand
actionable.
Reportingaccountsformostofthecampuscurrentdatawarehousecapability.Systemssuch
asBAIRSandCalProfilesfocusonreporting.
TheUCBdatawarehousewillprovideastartersetofpredefinedreportsforeachdistinctgroup
ofusers.Thesereports,liketheexistingBAIRSreports,willbeconfigurablebyuserstodelivera
specificsubsetofinformation.Theywillbedeliveredthroughasecureportalwhichwillmakeit
easyforuserstofurthercustomizethereportsandtosharebothreportrequestsandresults
withothersintheirworkgroups.Inaddition,theportalwillpermituserstoscheduleroutine
August 30, 2006
Page 6
reports,toprintreportsandtosharethemaselectronicdocumentsinotherformats,suchas
PortableDocumentFormat(PDF).
Thereportswillbedesignedforeasyunderstandability.Inaddition,theywillbeaccompanied
byhighqualitydatadocumentationexpressedinlanguagewhichcommunicateswiththe
expectedusers.Thisdocumentationwillbedeliveredthroughthereportingportal.
Asappropriate,reportswillusegraphics,suchaschartsanddatagraphs,tomakeinformation
morecomprehensible.
Dashboardsandalerts.
Intheinterestofdeliveringjusttheinformationmostusefultosupportdecisionmakingor
action,itwillbeusefultocomplementconventionalreportswithinformationdashboardsand
alerts.Dashboardscommunicateinformationwithquicklycomprehendedgraphicssuchasdials
andmeters.Typicallydashboardsareusedtoreportonestablishedperformanceindicators,
measuredatapredefinedintervals.Properlyplanned,dashboardsmakeiteasyforpersonnel
managingacomplexprocess,suchasundergraduateadmissions,forexample,toquicklyassess
currentstateandprogressagainstgoals.Likewise,sometimesreportingwillbeconfinedto
exceptionconditions.Exceptionsmaydeliveredperiodicallyasreports.Moreoften,though,
theywillbedeliveredasalertmessagesemailsorsimilarnotifications.Forexample,
departmentaladministratorsmightgetalertswhencourseenrollmentsexceedcritical
thresholds.Oragrantadministratormightgetanemailwhengrantexpendituresexceedplan
foragivenperiod.
Reportingintegratedwithproductionapplications.
Somereportingwillbemostusefulwhenitisdeliveredinthecontextofsomeothercomputer
basedprocess.Forexample,astudentwhoisregisteringforcoursesmightbenefitfrom
knowingaboutotherstudentswhotooksimilarcourseshowmanytookthesamesetof
coursesatonceandhowwelltheysucceeded;howmanycompletedthecoursesandtheir
averagegrades.Aresearcherapplyingforagrantmightbenefitfromknowingaboutprevious
submissionstothesamesponsorinthesameresearcharea.Toenablesuchincontext
reporting,thedatawarehousewillprovidereportingserviceswhichwillintegratewithother
softwareapplications.
Queryandanalysis.
Peoplewhohaveanalyticalskillsandjobsrequiringanalysiswillneedtheabilitytoexplorethe
informationinthewarehouse.Forexample,amemberofacommitteeinterestedinimproving
educationaloutcomesmightwanttoexplorethehowuseofthesummerprogramaffects
outcomemeasuresandhowthatvariesforstudentswithdifferentmajorsordifferenteconomic
backgrounds.EnablinganalysisofthiskindisoneofthegreatpowersoftheEDW.Usingthat
powerrequiresunderstandingtheinformationinthewarehouseandknowinghowtoselect
data,summarizeitordrilldownforfurtherdetail,andparticularlyhowtocombineinformation
acrosssubjectareas.TheEDWprovidestwokindsoftoolsforthisfunction:
1. Explorerstoolsforselectingdata,drillingdownorsummarizingdata,andcombining
dataacrosssubjectarea.
Page 7
2. Specialpurposeinformationviewswhichorganizetheinformationinthewarehouse
intosimplerstructureswhichareeasilyunderstoodandnavigatedbyparticularkindsof
users.
TheinformationintheEDWisfromtheoutsetorganizedforunderstanding,consistencyand
easeinaccuratelycombininginformationacrosssubjectareas.(SeeDataArchitecture).
However,manyusersofthewarehousefocusonsomeaspectsofthecampusbusiness.For
theseusers,muchoftheinformationinthewarehouseisirrelevantandunfamiliar.Thejob
focusedviewspresentmeaningfulsubsetsrelevanttoandunderstandablebyparticularsetsof
users.Theseviewsaredevelopedovertimeasnewkindsofusersbeginusingthewarehouse.
Inparticular,analyticaluserswillhaveaccesstoviewswhichmakeiteasytousethe
dimensionalstructureofthedatawarehousetoeasilysliceanddiceinformation.Thesesame
viewswillmakeiteasyforuserstosummarizeinformationbasedonrelevantcharacteristics,
suchasdemographics,withouthavingaccesstoprotectedidentifierssuchasstudentidsor
socialsecuritynumbers.
TheEDWwillprovideagood,generalpurposeexplorer/querytoolwhichiswellsuitedtothe
taskofselectingandcombiningdatafromthedimensionalstructuresofthedatawarehouse.
Thistoolwillhavecapabilitiesforpresentingselectedinformationineasilyunderstoodreport
formatandadditionallyasgraphs,chartsorothervisualizationtools.Additionally,the
warehousewillprovideamultidimensionalanalysistoolfordeliveringdataascubesormulti
dimensionalstructureswhichlendthemselvestoeasyslicinganddicing.(Thisanalysis
approachisoftenreferredtoasOnlineAnalyticalProcessingorOLAP).Bothtoolswillallow
userstoextractinformationforfurtheranalysiswithdesktopanalysistoolssuchas
spreadsheets.Bothquerytoolswilldeliverclear,easilyunderstoodanduserfocused
informationdocumentation.
Advancedapplications:modeling,forecastingandplanning.
Manyanalyticaluserswanttousethehistoricaldatainthewarehousetobuildmodelsof
alternativescenariostryingtopredict,forexample,theconsequencesacrossavarietyof
businessareas,suchascourseenrollments,studentaiddemand,andfeesrevenue.The
warehousewillprovideseveralfacilitiesenablingthiskindofmodeling:
Goodintegrationwithexternalmodelingtools,rangingfromdesktopspreadsheetsto
sophisticateddataminingtools.
Multidimensionaltoolsforbuildinglargemodelsdirectlyinthewarehousedatabase
operatingwithspreadsheetlikefunctionsbutatascalebeyondthecapabilityof
spreadsheets.
Facilitiesforsavingandsharingmultidimensionalmodelsacrossaworkgroup.
Manycampususersneedtobeabletounderstandthecurrentstatusoftheirfinancesagainst
plansnotmerelybudgetallocations,butmoredetailedplansforusingthebudget.Themost
frequentexampleinvolvesresearchprojects.Becausestaffmigratesamongresearchprojects,
projectmanagersneedtoolsforexpressingaspendingplanforaresearchproject,especiallya
multiyearproject,showinglaborforecastsbyindividual.Then,withthatplanmade,theproject
managersneedtotrackexpenditureagainstplan.Thesolutionforthisneedinvolves
collaborationbetweenthedatawarehouseandatransactionalapplicationforrecordingplans.
Onceplansarerecorded,theyaremigratedintothedatawarehouse,wheretheyareavailable
August 30, 2006
Page 8
forreportingandqueryincombinationwithactualactivity.Otherunitsbesidesresearchhave
similarneeds:IST,forexample,needstoprojectimportantfinancialeventssuchaslicense
renewalsandtrackactualexpendituresagainstsuchprojections.Asinthecaseofsalary
encumbrancesforresearchprojects,thedatawarehouseispartofthesolutionforthisneed.A
transactionalapplicationrecordsprojections,whicharecarriedforwardintothedata
warehousewheretheyareavailableforanalysisalongsidecurrentactivity.
ThereisnoprovisionintheDWarchitectureformorerobustenterpriseplanningapplications,as
businessusershavenotidentifiedaneedforsuchapplications.However,thewarehouse
architectsshouldreviewthisareawithinthreetofiveyears,asmanyorganizationsof
comparablesize,includingmajorresearchuniversities,usesuchplanningapplicationsto
managethegeneralproblemofcoordinatingplansacrossbusinessareasunderconditionsof
uncertainty.Duringcampusinterviews,informantsoftentalkedabouttheneedtomanagethat
problemhereatUCBerkeley.
Summaryofdifferencesfromcurrentarchitecture.
MostofthisBusinessIntelligencearchitectureconsistsofnewcapabilities.
Todaysdecisionsupport,suchasBAIRSandCalProfiles,areorientedtowardreporting.
Theyprovidepredefinedreports,withsomefacilitiesforconfiguringgeneralpurpose
reportsforspecificdatapopulations,showingaccountingactivityreports,forexample,
foronlyaspecificsetoforganizationalunitsorfunds.Boththesesystemsdoagoodjob
ofdeliveringbusinessdatadocumentationalongsidethereports.BAIRSprovides
facilitiesforexportingreportresultstospreadsheettoolsforanalysis.
Theprovisionoffacilitiesfordashboardpresentationandforalertsisnew,asare
businessintelligenceserviceswhichcanbecombinedwithotherapplications.
Mostimportant,allthefacilitiesforqueryandanalysisarenew,asaresupportfor
modelingandplanning.TheuseofOLAPanalysisisnew.Businessusersduringthe
requirementssurveytalkedabouttheirneedtoquerydatadirectlyratherthan
dependingonpredefinedreports.
DataArchitecture
TherearetwoimportantviewsoftheEDWdataarchitecture.Thefirstisthefunctionalviewof
theinformationtheEDWstoresanddelivers.Thisviewconcernscontentandmeaning.The
secondistheviewofwarehousedataasasetofsoftwarecomponents,mostlyinterrelated
databasetables.Thisviewconcernsaccessbyqueryandreportingtools.Thesetwoviewsare
describedastheContentsarchitectureandtheImplementationarchitecture,respectively.
Contents:thewarehousebusandthelogicaldataarchitecture.
Basically,theenterprisedatawarehouseisafontofinformationaboutwhathappened.Itisa
historyoftheoperationsofthecampusdrawnfromthecampusinformationsystems,which
recordimportantinformationintheprocessofhelpingtocarryoutoperations.Additionally,the
warehousedeliverscorrespondinginformationaboutplanssuchasforecastsandbudgets.
Beingabletoinvestigateandunderstandindetailwhathappenedenablesacycleoffinding
problemsandopportunities,craftingplansfordoingthingsbetter,thenmeasuringtolearn
whetherthoseplansworkedout.Trackingplansaswellhelpscampuspersonnelcomparewhat
August 30, 2006
Page 9
actuallyhappenedtowhatwasplannedinordertoidentifysurprises,allowingforfaster
responsestotheunexpectedandalsosupportingimprovementsintheplanningandforecasting
process.Sothecontentsofthewarehouseboilsdowntohistoryandplanslinkedtoconsistent
informationaboutthecontextofthoseeventsorplannedevents.
Thus,thecontentsofthedatawarehousehavetwocomponents:
1. Informationabouthistoryandplans.Thesearereferredtoasfacts,astheyusually
consistofdiscretefactsormeasurements.
2. Informationaboutthecontextinwhichtheseeventsormeasurementsoccur.This
contextinformationisorganizedalongconsistentdimensions.Sampledimensions
includetime,organization,andstudentinformation.
Thesecontextdimensionsprovidethemechanismwhichenablesashared,enterprisedata
warehouse.Combiningorcomparinginformationfromdifferentsubjectareasusuallyinvolves
liningupdifferentfactsalongthesamedimensions.Forexamplecomparingfacultyworkload,
studentteachingcontactandresourcesusedinvolvescombiningfactsaboutteaching
assignments,courseenrollments,andexpendituresalongcommondimensionsoforganization
andtime.
Mostofthedatamismatchorduelingdatawhichmakesithardtoagreeonreportsand
measurementscomesfromusingdifferentversionsofthesamedimensionstoselectand
summarizedata.Havingacommonsetofdimensiondefinitions,ontheotherhand,makes
comparingdataacrossareasfeasibleandunderstandable.
Becausecontextdimensionsusuallyaresharedacrosssubjectareas,oneofthemostimportant
andchallengingaspectsofdatadesignforthewarehouseconsistsofidentifyingshared
dimensionsandmakingsurethateveryoneisusingasetofshareddefinitions.Thesetof
shareddimensionsrepresentsapowerfulinterpretivegridformakingsenseofdiversefacts.
Theyalsoenableiterativedevelopmentofthewarehouse.Newfacttablescanbesourcedfrom
operatingsystemsandlinkedtoearliercomponentsalongexistingshareddimensions.Thedata
warehousegrowsbyaddingdatamartsoneatatimetothewarehouse.Adatamartisafact
tableanditsassociateddimensions.Asetofrelatedfacttablesmakesupadatasubjectarea.
Considerableinvestigationandanalysiswentintoidentifyingthebasicfactswhichmakeupthe
campusdatawarehouseandtheirrelationshiptocontextdimensions.Together,thebasicfacts
anddimensionsrepresentthebasecontentarchitectureofBerkeleysdatawarehouse.That
contentarchitectureissummarizedinatableoffactsanddimensionsattachedasanappendix
tothisdocument.
Followingisanextractfromthatfactdimensionmatrix:
Page 10
Keydatamartsbysubjectareaandtheirshareddimensions(EXTRACT)
Dimensions
DataMarts
Date(semester,census,add/drop,fiscal,effective,reported,
Facility
Major(alldegreegrantingprogramsandinterrelatedness)
Course(allcoursesincl.allsections,offeringdept,etc.)
Student(regstatus,residency,major,
Applicantforadmission
FinancialAidProgram
ExternalInstitution(highschools,otherhighered,SAT
ResearchSponsor(govtagency,foundation,corporation,
Employee(descriptiveinfofac,staff,affiliates,etc.)
JobApplicant
BudgetedStaffPosition(budgetdimensionforperm
Appointment(title,jobcode,leavestatus,etc.)
JobClassification
Teaching
Teachingassignment
Classroomspace
assignment
Studentclassenrollment
Waitlistenrollment
Degree
Studentcourseevaluation
Studentsurvey
Student
Undergraduateadmissions
Graduateadmissions
Semestersnapshot
Financialaidawards
Externaltestresults
Activitymembership
x
x x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Page 11
Implementation:databasesandthephysicaldataarchitecture.
TheimplementationdataarchitectureissummarizedinFigure2.
Itcontainstheseprincipalcomponents:
1. Thewarehouseitself.Thisisacollectionoftablesandviewsconsumedbyendusers,
directlyorthroughtheapplicationstoolsdescribedabove.Thedatainthewarehouse
hasbeenprocessedforconsistencyandalignmentwithstandarddatadescriptionsand
valuesets.Factshavebeenalignedwiththestandarddimensions.
2. Astagingenvironment.Thisisasetofdatabasesandfilesusedbythedata
maintenanceprocesstopreparedataforpublicationinthewarehouseasitflowsfrom
theoperationalsystemswhichcollectitoriginally.(Thismaintenanceprocessisoften
referredtoasExtractTransformLoad,orETL).Thisenvironmentisnotdirectly
accessedbyusersofthewarehouse.
3. Acollectionofmetadatadescriptiveinformationaboutthecontentsofthe
warehouse,aboutthesourcesystems,aboutthebusinessmeaningofinformation,
aboutaccessandprivacyrules,andabouttheprocesseswhichcreateandtransform
databeforeitappearsinthewarehouse.Thismetadataisdescribedingreaterdetail
below.Thoughmanagedasaconsistentset,themetadataitselfisstoredinrepositories
distributedamongthewarehouse,thestagingenvironmentandthesourcesystems.
(Thoughthearchitecturediagramsuggeststhattheimplementationisasingle,centralized
databaseenvironment,infactanyofthecomponentsmaybedistributedacrossavarietyof
databaseserversandevenavarietyofDBMSs,asdiscussedunderDBMSplatform,below.)
Page 12
Figure3:ImplementationDataArchitecure
Thedatawarehousecomponentisthelargestandmostcomplicatedofthesecomponents.Itis
worthdescribingthatcomponentindetail.
Coreofthiscomponent,andofthewarehouseitself,isasetoflinkedfactanddimensiontables.
Theseimplementthefactsanddimensionsdescribedabove.Normallytherewillbeasinglefact
tableforanygivenkindoffactinthewarehouse.Asinglerowinthefacttablecontainsthe
measuresrelatedtothatfactatthelowestlevelofgranularityappropriatetothefact.For
example,arowinthestudentcourseenrollmentfacttablewillcontaininformationabouta
changeinasinglestudentsenrollmentinaspecificsectionofacoursetaughtinonesemester.
Eachrowinafacttablewillalsocontainaforeignkeyreferencetoadimensiontableforeach
relevantdimension.
Factsareusuallynavigatedassetsidentifiedbycharacteristicsspecifiedinrelateddimension
tables.AnexamplewouldbeacountofallthecourseenrollmentsbyseniorsinPhysicsclasses
meetingbefore9a.m.inthepasttwospringsemesters.
Bothfactanddimensiontablesareidentifiedwithsystemassignedsurrogatekeys.Thisbuffers
thetablesfromanomaliesinsourcesystemsandallowsforhistoricalvariation.Italsomakesit
easiertomaskoffnaturalidentifiersoftennecessarytomeetsecuritygoals.
Cubesarespecializedviewsofasetoffactsanddimensions.Theytakeaformverysimilarto
spreadsheets,inthesensethattheyarecomposedofcellsvisualizedalongasetofaxes.Unlike
August 30, 2006
Page 13
spreadsheets,however,cubesareprovideddirectlythroughdatabasetechnologyoftencalled
OnLineAnalyticalProcessing,orOLAP.OLAPcubescanbemuch,muchlargerthanany
spreadsheet.Theaxesincubesareprovidedbythevaluesofhierarchicaldimensions.Sofor
example,rowsmightbeallindividualcourseofferings.OLAPcubesareveryusefulforcertain
kindsofanalysisandpatternrecognition.Personnelsuchasbudgetanalystsfrequentlyfind
cubesusefulviewsofwarehouseinformation.Suchpersonneloftenusedatabasecubesinthe
processofmakingmodels or projections. Those models canbestoredasusercubesalongside
actualwarehousecontentsandsharedwithotherusers.
Sourcingstrategy
Byprincipal,datainthewarehouseissourcedfromitsauthoritativesystemsofrecord,as
certifiedbythedatasorganizationalcustodians.Generally,identifyingtheappropriatesource
offactinformationisnotdifficult,asfacttablesusuallyaresourcedfromasinglesystem.
Dimensiontables,however,aremorecomplicated.Informationindimensiontables,particularly
forshareddimensions,oftenisderivedfromseveralsystems.Moreover,manualintervention
mayberequiredtoprovidedatanotinanysinglesourcesystem.Forthatreason,thereisa
dimensionownerassignedtoeachdimension.Theownerisafunctionalcustodianresponsible
forestablishingthecontentofdimensiontablesandpublishingchangestoallusers.For
example,Fundisawidelyshareddimension.Thoughmuchofthedescriptiveinformationabout
fundsissourcedfromthefinancialsystem,otherinformationaboutfundsponsors,donors,
participatingprincipalinvestigators,etc.,needstobederivedfromothersources.Theownerof
theFunddimensionisresponsibleforprovidingaprocessfortrackingrelevantchangesand
ensuringthatchangestoinformationaboutfundsispublishedtoallusers.
Thedatainthewarehouseissubjectedtocontinuingdataqualityassurance.Beforeasubject
areaisaddedtothewarehouse,thedatainthatsubjectareaisanalyzedforconsistency,
completenessandaccuracy.Resultsarereviewedwithdatacustodiansandoverallqualityis
confirmedbeforethedataismadeavailableforuseinthewarehouse.Routineprocessesfor
validatingthequalityofthedataareincorporatedintheroutinemaintenance(ETL)process.
Metadataarchitecture
ThediagrambelowshowsthebasicelementsofalongtermvisionforEDWmetadata
management.ThekeytothisarchitectureistheMasterMetadataRepository,thecentral
clearinghouseforallEDWmetadata.Builtontopofthisrepositoryaretheapplicationsneeded
topublishitscontentstotheplatformsandprocessesshownatthetopofthediagram.
TheMasterMetadataRepository(MMR)isfunctionallypartoftheEDWasawhole,refreshed
byitsownsetofETLprocessesanddynamicinterfacesthatextractmetadatafromthesources
shownatthebottomofthediagram.Theseprocessesshouldbedesignedtoseamlessly
integratemetadataaccessandmodificationwithdatacollection,analysis,maintenance,and
developmentactivities.
TheMMRservesusersbeyondtheEDWcommunity.BecausetheMMRcontainsmetadatafrom
systemsoutsideoftheEDW,ithelpusersofthosesystemsaswell.ItalsowillsupportIST
processessuchassystemsdevelopment,changemanagement,andtechnicalsupport.
Thismetadatamanagementframeworkwillneedtobeimplementedincrementally,thefirst
phasebeingthecreationofaMasterMetadatarepository.Subsequentimprovementsshould
August 30, 2006
Page 14
servetodeliverincrementalvaluetotheEDWanditscustomersuntilthefinalfuturestateis
reached.
BI Tools
EDW
Metadata
Portal
ETL Tools
Database
Management
Tools
Change
Management
Processes
Master Metadata
Repository
BI Portal
ETL Metadata
Repository
Sharepoint
Portal Server
EDW Metadata
Repository
Other Metadata
Repositories
Figure4Metadatamaintenance
Summaryofdifferencesfromcurrentdataarchitecture
Thebiggestdifferencefromthecurrentarchitectureistheadditionofalargenumberof
newdatasubjectareas,broadeningthewarehousetoincludecomprehensive
informationaboutactivityinthecoreareasofcampusactivity.
Thesecondbiggestdifferenceistheprovisionofanaccuratefactdimensiondata
architecture,implementedwithshareddimensionsandmanagedbydimensionowners.
Thenewdataarchitecturerectifiestheoverlapsandinconsistenciesbetweenthe
existingBISandBAIRSdataandcorrectsthosesystemstoalignfactswithstandard
dimensionsusingthestandardsurrogatekeymechanism.
Additionally,theprovisionofsimplifyingdatabaseviewsbasedonthefunctionalneeds
ofvarioususergroupsisanextensionofcurrentpractices.
TheprovisionofOLAPcubesisnew.
Consolidatedmetadatamanagementisnew.
Page 15
Security Architecture
Thegoalofthesecurityarchitectureistoprovidefinegrainedcontroloveraccesstodata,
administeredaccordingtothepoliciesofappropriatedatacustodians.Thisincludesmanaging
accessattheindividualdataelementlevel,butitalsoincludesconsiderablymore.Currentdata
accessrequirementsmeanthatsometimesthewarehousehastocontrolaccesstoinformation
withinaparticularcontext,suchasinformationaboutstudentswhohavetakencertainclasses
orstudiedwithparticularinstructors.Restrictionsonsmallsamplesizesimplythatforsome
uses,dataaccessisrestrictedtoanswersetslargeenoughthattheydontimplicitlyidentify
individualpersons.Meetingalltheserequirementsisdonebyasetoffacilities,some
automatedinthedatabasesandsomeinthereportingportal.
Thesecurityarchitectureissummarizedinthefollowingdiagram:
Page 16
Summaryofdifferencesfromcurrentarchitecture
ImproveSystemAccessProcessforNewHires
Createanewautomatedprocesstoprovisionaccessprivilegesfornewusers,especiallythose
fromvolatileusergroupssuchasstudents.Thiscanbeachievedinitiallywithanapplication
thatpopulatesnewuserroleswithinSARAbasedondataobtainedfromtheHRManagement
System.AsthecampusIdentityandAccessManagement(IAM)systemevolves,theEDWaccess
provisioningwilladapttoalignwithrolebasedaccessprovisioningmediatedbytheIAM.
IncorporateSecurityProcessImprovementsintoEDWLifecycleComponents
Includeasecurityrequirementcomponentforallnewinitiatives,andtrackanychangesthrough
analysis,design,developmentandimplementation.Inaddition,instituteregularreviewofall
existingsecurityrequirementsbybusinessstakeholdersanddatastewards,especiallywhen
majorchangestotheEDWenvironmentarebeingplanned.
ImproveMaintenanceofSystemAccessPrivilegesasUserRoleschange
Thiscanbeachievedthroughoneofthefollowingmethods:
Provideuserliststosupervisorsonaregularbasis(e.g.onceayear)toconfirmrolesare
beingproperlyassigned.Currently,supervisorsarenotconsistentlytrackedwithahigh
degreeofaccuracywithinHRMSthiswillneedtobeaddressedinordertoimplement
thissolution.
CreateanautomatedprocessthatupdatesuserroleswithinSARAbasedonchanges
withinHRMS.
Addcontextbasedauthorizationfeaturestoexistingsecuritycontrolprocess
ModifySARAtoprocessnewcontextbaseduserroles(e.g.instructor)usingnew
securityfieldswhichwillbestoredandtrackedinarevisedBAIRBISSecurityTable.
ModifyexistingapplicationstoloadnewrolesandassociatedsecurityfieldstoEDW
securitytables.
ModifyHyperionPortalSecurity,OracleRoleAssignments,andOracleVPDto
implementproperfilterstorestrictuseraccesstoEDWdatabasedonnewroles.
Avoidexposingpersonalinformationbyimplementingthefollowingtechniques.
Usesurrogatekeys.Inmanycases,personalidentifiers(e.g.name,studentid,
employeeid,socialsecuritynumber)canbeentirelymaskedwithoutreducingthe
powerofthedatawarehousetodeliverusefulinformationaboutstudentandemployee
behavior,distributionbydemographics,etc.Theenablingmechanismistheuseof
substituteidentifiersknownassurrogatekeys.Usingsurrogatekeysprovidesmany
otherbenefits,includingeaseintrackingvariationovertimeininformationabout
students,employees,etc.TheEDWshouldusesurrogatekeysinmostcases.
Useviewstoexcludesmallsamplesizesfromaggregatequeries.Thismayrequirethe
creationandmaintenanceoftableswhichtrackminimumsamplesizesbasedonsubject
areaorothercriteria.Thisapproachshouldbereevaluatedasdetailedrequirements
forsmallsamplesizesaredeveloped.Inparticular,viewscanbeusedmoreeffectivelyif
theyarebasedonuniversallyappliedrulesratherthanruleswhicharecontingenton
userrolesorothercontextualinformation.
Page 17
ReconcileEDWsecurityarchitecturewiththeevolvingcampusIdentityandAccess
Management(IAM)initiative.
ISTisleadingasubstantialinitiativetoimproveidentitymanagementonthecampus.TheEDW
securityarchitectureisdesignedtousetheCalnetandtherestofthecampusIAMsystem.As
necessary,theEDWwillevolvetoalignwithchangesintheIAM.One example of changes to
IAM is the move to assign theCalnetuseridbasedontheUIDfield.Thiswouldensurethat
campususerswhoarebothstudentsandemployeeswouldhaveasingleconsistentuserid.As
theIAMevolvesprovideconsistentroledefinitions,theEDWaccessprovisioningwouldevolve
fromtheexistingSARAsystemtousetheIAMrolestohelpwithprovisioningaccessprivileges.
TechnologyArchitecture
Technologyrequiredfordeliveringthedatawarehouseissummarizedinthefollowingdiagram.
ThistechnologyarchitecturedoesnotdifferfromthetechnologysupportingBAIRS.Indeed,
overthepastfewyears,technologytosupportbusinessintelligencehasmatured,consolidated,
andsimplifiedsomewhat.Inparticular,itisnowpossibleanddesirabletoeliminatethewelter
ofdesktopbasedquerytoolsandcorrespondingdatabaseconnectionprotocolswhichwerea
featureofolderbusinessintelligenceimplementations.
Someofthearchitecturecomponentsaredescribedinmoredetailbelow.
Figure5:DWTechnologyComponentsandProtocols
Page 18
DWdeliveryenvironment
Asillustratedabove,queryandreportingfunctionsareautomatedbyaserverbasedBusiness
Intelligenceplatform.Thatplatforminteractswithusersmainlythroughbrowserbasedquery,
reportingandanalysistools.Inaddition,theplatforminteractsdirectlywithsomeofficetools,
notablyMicrosoftExcel,againusingbasicwebinterfaces.
TheBIplatformdeliversthequeryandanalysisfunctionalitydescribedintheApplications
Architecturesectionabove.ItinteractswiththeDWdatabases,boththedimensionalRDBMS
platformandtheMDDB(OLAP)platform.
Muchofthedatabaseunderstandingfunctionalitydescribedaboveandsomeofthesecurity
featuresareautomatedbytheBIplatform.Desktopbasedquerytoolscannotdeliversuch
functionalityaloneandthereforearenotincludedinthearchitecture.
ForBAIRS,thefunctionsassignedtotheBIplatformaredeliveredbytheHyperiontoolset.
However,thecurrentlyinstalledversiondoesnotdeliverallofthefunctionsassignedtothis
component.Duringimplementationofthearchitecture,thecampuswillneedtoevaluate
Hyperionagainstcompetingproductsandchooseanimplementationplatform.Thoughthere
areseveralcapableBIplatformproducts,theyarebynomeansstandard.Asaresult,an
effectiveBIplatformforthecampusshouldbeasingleproductset.Forthatreason,the
architecturespecifiestheBIplatformasasingletechnologycomponent.
Inadditiontothebasicdataaccess/BIplatform,thedatawarehousewillincludeovertimea
varietyofanalysis,modelingandreportingtools.ThesewillinteroperatedirectlywiththeDW
databasesandwithusersthroughbrowserbasedinterfaces.Adataminingtoolsetisan
exampleofsuchatool,asisacomprehensiveenterpriseplanningsystem.
Databasetechnology
Asdescribedinthedataimplementationarchitectureabove,thedatawarehouseusestwokinds
ofdatabasemanagementsystems:relationalDBMS,whichcontainstheoriginalcopyofall
warehousedata,andaMultidimensionalDBMS,whichstoresanddeliversOLAPcubes.
Today,allthecentrallymanagedwarehousedataisstoredinOracle;thatislikelytocontinueto
betrue.However,thereisnothinginthearchitecturewhichlimitsthewarehousetoonekindof
RDBMS.Itmaybepracticaltoimplementsomedatasubjectareasinsmaller,lesscostlyRDBMS
productssuchasSQLServer2005.
Ontheotherhad,multidimensionaldatabasesareonlylooselystandardized.Forthatreason,it
isundesirabletodeploymorethanonekindofMDDB.ThearchitecturespecifiesonlyaMDDB
whichwillsupportMDXml;mostproductsdo.Asthearchitectureisimplemented,thecampus
willneedtoevaluateavailableofferingsandselectone.
Summaryofdifferencesfromcurrentarchitecture
Thisarchitecturediffersfromtheinstalledarchitectureinthreeprincipalways:
1. Itincludessupportformultidimensionaldatabases(OLAPstructures).Thisisentirely
new.
2. Thearchitecturetotallyeliminatesdesktopbasedquerytoolsinfavorofserverbased
dataaccessdeliveredattheworkstationthroughthebrowserandthroughofficetools
suchasMicrosoftExcel.Thiscompletesamigrationwhichbeganawhileago.
August 30, 2006
Page 19
3. Thearchitectureprovidesimportantnewinterfaceswithothercampussystems.These
areoftwokinds:
a. Webservicesinterfaceswhichallowbusinessintelligencetobedeliveredinthe
courseoftransactionalsystems.Someoftheapplicationsofthisfeatureare
describedintheApplicationsArchitecturesection,above.
b. Asetofinterfacesfordeliveringbusinessintelligencethroughastandardportal.
Theseinterfacesenableusefuldashboardsandalertsaswellasreporting
portlets.
SupportArchitecture
Mostofthecriticalsuccessfactorsforthedatawarehouseidentifiedduringbusinessinterviews
concernedtheneedforexecutiveengagementandsupportaswellastheneedforrobust
participationbyfunctionalexpertsfromallareasofthecampus,academicandadministrative.
Theseobservationsareconsistentwiththeexperienceoforganizationswhohavesuccessfully
implementedenterprisedatawarehouses.
Theseneedsdrovethedesignofthesupportstructureforthedatawarehouse,madeupofskills
andorganizationaswellasprocesses.
Thefullsupportarchitectureisdocumentedinaseparatedocument,DataWarehouseSupport
Architecture,whichisavailableattheEDWRoadmapwebsite.
Following are the most important features of the support architecture.
Skillsandorganization
DataWarehouseCompetencyCenter
ThesuccessfuldeliveryofDataWarehousesystemsrequiresthedevelopmentofasupport
architecturewiththecapabilitiesto:
1) Ensureinvestmentandprioritiesinthedatawarehousearealignedwithcampus
goalsandobjectives
2) MaintainanoverallDataWarehousedesignandarchitecturethatpromotesdata
consistency,reliabilityandquality
3) Providestrongdatastewardshipandgovernance
4) Managemultiple,concurrentdatawarehouseprojects
5) Maintainabalancebetweendevelopingnewsystems(addingnewdatasources
and/ornewusergroups)andmaintainingexistingsystems
6) Managemultiplesourcesoffunding,witheachcontributingtobuildingthedata
warehouseinfrastructure
Thesecapabilitiescanbecategorizedinto4areas:
Governance
GeneraloversightoftheDataWarehousesystemsincluding
funding,prioritizationofinitiatives,resourcecommitmentsand
datastewardship.
Page 20
PlanningandOrganizing
Businessrequirementsandanalysis,highleveldesignand
architecture,datastandardizationandconformity,enduser
trainingandsupport
Development
Development,orsignificantenhancementsto,datawarehouse
systems
Support
Implementationandsupportofdatawarehousesystems
ThesecapabilitiesarebestdevelopedandmanagedasaCompetencyCenter.TheCompetency
Centerconsistsof:
a) AvirtualorganizationconsistingofbothbusinessandITresourcesthatare
responsiblefordefining,buildingandmaintainingacentralizeddatawarehouse
environment
b) AsetofprocessesusedtomanagetheactivitiesoftheCompetencyCenter
AnunderlyingassumptionoftheDataWarehouseCompetencyCenteristhatdatawarehouses
arebuiltincrementally.Thereforeaprogramapproachisrequiredtomanagemultipleprojects
addingnewincrementstothedatawarehouse,andtooverseethemaintenanceofexistingdata
warehouseapplications.
OrganizationalComponentsoftheDataWarehouseCompetencyCenter
TheDataWarehouseCompetencyCenterconsistsof7teams:
SteeringCommittee
ProvidingfundingandoverallobjectivesfortheDataWarehouse
CompetencyCenter
StrategyCommittee
ResponsibleforapprovingtheDataWarehouseRoadmap,prioritizing
initiatives,championshipofthedatawarehouseprogramand
sponsorshipofdatawarehouseprojects
ProgramManager
ResponsibleforoverallmanagementandcoordinationoftheData
WarehouseCompetencyCenter
StrategyTeam
Responsiblefordefiningdatawarehouseopportunities,business
casesandbusinessanalysis,highleveldesignandendusersupport
SubjectMatterExperts
Representativesoftheacademicandadministrativedepartments
(whoareendusersofthedatawarehouse)responsibleforproviding
requirementsandendusersupport
Page 21
DevelopmentTeam
Responsiblefordevelopmentofnewdatawarehouseincrementsas
definedbytheDataWarehouseRoadmap
OperationsTeam
Responsiblefordeploymentofnewincrementsandsupportof
existingdatawarehousesystems
Page 22
Organizationally,theDataWarehouseCompetencyCentercanbedepictedas:
Data Warehouse
Steering Committee
Data Warehouse
Strategy Committee
Data Warehouse
Program Manager
Data Warehouse
Strategy Team
Data Warehouse
Development Team
Data Warehouse
Operations Team
Subject Matter
Experts
Governance
Development
Support
Detailedrolesandresponsibilitiesforeachteamaredescribedbelow:
Page 23
DataWarehouseSteeringCommittee
TheDataWarehouseSteeringCommitteeiscomprisedofCabinetlevelofficersrepresentingall
divisionsoftheUCBerkeleyorganization.TheresponsibilitiesoftheSteeringCommitteeare:
Toprovidetheoverallobjectivesforthedatawarehousebasedoncampusstrategic
goals
Providefundingfordatawarehouseinitiatives
Periodicallyreviewperformanceofdatawarehouseinitiativestodetermineifcampus
goalsarebeingmetandtoreviewfundingandresourceneeds
Provideguidanceonregulatoryrequirementsthatimpactthedatawarehouse
TheSteeringCommitteeisnotinvolvedinthedaytodaymanagementofdatawarehouse
projects;howevertheDataWarehouseStrategyCommitteemayescalateissuestotheSteering
CommitteeifintheirjudgmenttheissuewarrantsattentionattheCabinetlevel.
DataWarehouseStrategyCommittee
TheDataWarehouseStrategyCommitteeconsistsofseniorofficersonelevelbelowCabinet
levelaswellastheDataWarehouseProgramManager.ThemembersoftheStrategy
CommitteerepresentalldivisionsoftheUCBerkeleyorganizationwithaninterestandneedfor
reportingandanalytics.TheresponsibilitiesoftheStrategyCommitteeare:
ManagingtheDataWarehouseRoadmap
ReviewrecommendationsfordatawarehouseinitiativesfromtheStrategyTeam
Prioritizedatawarehouseinitiatives(strategic,tacticalandoperational)
EnsuredatawarehouseprojectsarealignedwiththeoverallUniversitysgoals
Ensuredatawarehousesystemsarecompliantwithregulatoryrequirements
ProjectInitiation
Approveprojectsandauthorizeuseoffunding
Sponsorprojects
ProjectTracking
Reviewprojectstatus,budget,issuesandrisks
MonitorServiceLevels
August 30, 2006
Page 24
Approveservicelevelsandreviewactualservicelevelsachievedversustarget
Communication
Providechampionshipofdatawarehouseprogram
Communicationwithotherseniorofficersandmanagement(ofthedatawarehouse
initiativesandexpectedbenefits)
EscalateissuestotheSteeringCommitteeifnecessary
DataWarehousePerformanceAssessment
Reviewandapprovalofcriticalprocessesusetomanagedatawarehouseinitiatives
Assessmentofdatawarehouseteam(s)skillsandcapabilities
Assessmentofdatawarehouseprogrammetrics
DataStewardship
Reviewandapprovedataretentionandsecuritypolicies
Reviewandapprovedatastandardization,definitionsandnamingconventions
Definedataqualitystandards
DataWarehouseProgramManager
TheDataWarehouseProgramManagerisafulltimepositiondedicatedtocoordinatingand
managingallactivitiesrelatedtodatawarehouseinitiatives.TheDataWarehouseProgram
ManagerisamemberoftheStrategyCommitteeanddirectlyoverseestheworkoftheData
WarehouseStrategyTeam.TheresponsibilitiesoftheDataWarehouseProgramManagerare:
FacilitationofSteeringandStrategyCommittees
PrepareandmaintaincharterfortheSteeringandStrategyCommittees
PrepareagendaforSteering&StrategyCommitteemeetingsanddistributemeeting
minutes
WorkwithDataWarehouseStrategyTeamonpresentationsforCommitteemeetings
ManageDataWarehouseCapabilityMaturityModel
Conductandmaintainassessmentofcurrentskills
ReviewandapproveDWprocesses
Conductandmaintainassessmentofdatawarehouseprocesses
Assesscurrentcapabilitiesagainstestablishedindustrybestpractices
Definetargetstateandcreategapanalysis
Developtrainingplanandinitiativestoreachtargetstate
Page 25
AlignmentwithotherIS&TInitiatives
CoordinateandcommunicatewithIS&TProgramOffice
ProgramManagement
PrepareandmaintaincharterfortheDataWarehouseCompetencyCenter
Definedatawarehouseobjectivesandmetrics
Definethemethodologyforprioritizationofdatawarehouseinitiatives
Reviewandapprovetheprojectmanagementmethodology
Provideprojectstatusreporting
Budgetmanagement
EnsurebestpracticesarefollowedfordesignanddevelopmentofDataWarehouse
projects
DataWarehouseStrategyTeam
TheDataWarehouseStrategyTeamconsistsofBusinessandDataAnalysts,EnduserSupport
Analysts,theDataWarehouseDataArchitectandtheDataWarehouseTechnicalArchitect.The
teamisresponsiblefordefiningrequirements,highleveldesign,projectmanagementandend
usersupportandtrainingforalldatawarehouseinitiatives.TheStrategyTeamworksclosely
withateamofSubjectMatterExpertsrepresentativesfromthecampusoradministrative
departmentswhohelpdefineandprioritizerequirements.Specificresponsibilitiesofthe
StrategyTeamare:
Creating/MaintainingtheDataWarehouseRoadmap
Identificationofdatawarehouseopportunities
Developbusinesscasesandcostbenefitanalysis
ProvidingrecommendationstotheStrategyCommitteeforstrategic,tacticaland
operationalprojects
ProjectInitiation
Requirementsgathering
Businessanalysis
Developprojectplan
HighLevelDesign
Solutionsarchitecture
Servicelevelagreements
Conceptualandlogicaldatamodels
Sourcedatamapping
Page 26
ProjectManagement
Statusreporting
Issuesmanagement
Riskmanagement
TechnologyEvaluation
Evaluatedatawarehousetoolsetsandtechnologies
Recommendtechnologiesthatcanaddbusinessvalueandenableachievementofthe
Universitysobjectives
DataQualityManagement
Definedatadefinitions/standards
Dataretentionpolicy
Securitypolicy
Dataqualitymanagement
UserAdoptionandTraining
Businesschangemanagement
Useradoptionplanning
Endusersupportandtraining
DataWarehouseDevelopmentTeam
TheDataWarehouseDevelopmentTeamisresponsibleforaddingnewfunctionalityandsubject
areastothedatawarehouse.Typicallythedatawarehouseisdeveloperasaseriesof
incrementsdefinedbytheStrategyTeamintheDataWarehouseRoadmap.TheDevelopment
TeamdoesnotprovidethedaytodayoperationsfortheDataWarehousesystems.
ResponsibilitiesoftheDevelopmentTeamare:
DetailedDesign
Extract,TransformandLoad(ETL)functionalspecifications
Physicaldatabasedesign
Metadatapackagedesign
Report/cubespecifications
Security
Createrequirementstraceabilitymatrix
Createtestplans
Page 27
Development
Createandmaintaindevelopmentstandards
ETLdevelopment
Databasesetup/scripting
Metadatadevelopment
Report/cubedevelopment
Portalcustomization
Unittesting
Documentation
DataWarehouseOperationsTeam
TheDataWarehouseOperationsTeamisresponsibleforthedaytodaymonitoring,
maintenanceandsupportofdatawarehousesystems.Clearseparationofthedutiesof
operationsanddevelopmentlimitstheimpactofproductionmanagementandissuesonthe
developmentofnewdatawarehouseincrements.SpecificresponsibilitiesoftheOperations
Teamare:
Validation
QA/Integrationtesting
Useracceptancetesting
Turnover/knowledgetransfertoOperationsTeam
Turnover/knowledgetransfertoStrategyTeam
Deployment
Migratenewincrementstotheproductionenvironment
Implementsecuritypolicy
Sourcecodemanagement
Createoperationsdocumentation
Createendusertrainingtools
Operations
Scheduling
Datavalidation
Backups
Applicationlogging(frontendandbackend)
Managementofproductionissues
Rootcauseanalysis
Page 28
PerformanceManagement
Ongoingmonitoring
Reportingonservicelevels
Tuningandconfiguration
DataQualityManagement
Implementdataqualityretentionandsecuritypolicies
Measurementofdataforaccuracy,reliabilityandconsistency
Enhancements
Frontendproductionenhancements(presentationlayer,reports,OLAP)
Backendproductionmaintenance(ETL,databases,datawarehouse,datamarts)
InfrastructureManagement
MaintenanceofDev/QA/Prodenvironments
Installationofdatawarehousetoolsandsoftware
Disasterrecoveryprocedures
CapacityManagement
Ongoingmonitoring
Capacityplanning
EnduserSupport
Technicalhelpdesk
Add/change/deleteusers
Technicalendusertraining
Processes
TheDataWarehouseCompetencyCenterrequiresastandardsetofprocessesto:
EnsureDataWarehousesystemsareinitiated,implementedandsupportedina
consistentmanner
MeasureperformanceoftheDataWarehouseCapabilityCenter
Managedataownershipandstewardship
Managetheskillsandprocessesneededforcontinuousimprovement
ThecoreprocessesrequiredbytheDataWarehouseCompetencyCenterare:
1)
2)
3)
4)
5)
ProjectInitiation
HighLevelDesign
Development
Validation
Deployment
Page 29
6)
7)
8)
9)
10)
UserAdoption
EndUserSupport
DataWarehouseProgramManagement
DataWarehousePerformanceAssessment
DataStewardship
ResponsibilitiesforownershipandparticipationintheseprocessesaredefinedintheRACI
matrixbelow,where:
R
A
C
I
Indicatesthepartyisresponsiblefortheprocess
Indicatesthepartyisaccountablefortheprocess
Indicatesthepartyisconsultedontheprocess
Indicatesthepartyisinformedoftheprocess
Operations Team
Development Team
Strategy Team
Program Manager
Strategy Committee
Steering Committee
Project Initiation
Validation
Deployment
User Adoption
Data Stewardship
R
C
R
C
AdetaileddescriptionoftheseprocessesisprovidedintheSupportArchitecturedocument
citedabove.
August 30, 2006
Page 30
Summaryofdifferencesfromcurrentarchitecture
Anenterprisedatawarehouseneedsprocessesandsupportingorganizationsto
coordinateevolutionofthewarehousewithoverallcampusprioritiesandneeds.The
additionofthoseprocessesandorganizationsrepresentthebiggestdifferencebetween
thesupportarchitecturejustdescribedandprocessescurrentlyinplace.
Inparticular,thetwohighlevelcommitteeswhichprovidefundingandobjectivesand
prioritizedevelopmentworkareimportantcomponentsnotyetinplaceonthecampus.
Havingasingleprogrammanagerresponsiblefortheoverallmanagementand
coordinationoftheDWcompetencycenteronbehalfofthewholecampusisanother
importantchange.
Alsonewishavingsubjectmatterexpertsfromthevariousacademicandadministrative
functionsformallyengagedinclarifyingrequirementsandhelpingtoensuredata
quality.
Finally,thedatawarehousecompetencycenterdescribedaboveisasinglevirtualteam
comprisedofpersonnelwithfunctionalandITspecialties.Thisrepresentsanevolution
fromcurrentpractice,asdoesdistinguishingdevelopmentandoperationsteams.
Page 31
Appendices
UCBerkeleyFact/DimensionDataMatrix
ArchitectureRequirementsGapAnalysis
Page 32
Data Marts
Teaching assignment
Classroom space assignment
Student class enrollment
Wait list enrollment
Degree
Student course evaluation
Student survey
Undergraduate admissions
Graduate admissions
Semester snapshot
Financial aid awards
External test results
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Product/Commodity
COA-Flexfield
COA-Project
COA-Program
COA-Account
COA-Business unit
Fund
Job Requisition
Job Classification
Job Applicant
Facility
EDW: Key data marts by subject area and their shared dimensions
Dimensions
Teaching
x
x
x
x
x
Student
x
x
x
x
Page 1
Data Marts
Activity membership
Grant proposal
Grant award
Contacts
External events
Memberships
Pledges
Gift receipt
Recruitment
Personnel actions
Training Activity
x
x
x
x
Human Resources
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Product/Commodity
COA-Flexfield
COA-Project
COA-Program
COA-Account
COA-Business unit
Fund
Job Requisition
Job Classification
Job Applicant
Facility
Sponsored Research
x
x
Spending
Page 2
Data Marts
Payroll
Purchasing
Permanent budget
(Temporary) budget
Balances
Pre-encumbrances
Encumbrances
Revenue/expense journals
Communications services
IST services
Physical plant
Other recharge areas
Space assignment
x
x
x
x
x
x
x
x
x
x
x
x
COA-Flexfield
Vendor (descriptive info of individuals & orgs)
Product/Commodity
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
COA-Project
Job Requisition
Job Classification
Job Applicant
COA-Program
Recharge Billing
COA-Account
COA-Business unit
Fund
x
x
x
Facility
x
x
x
x
Facilities
x
Page 3
Data Marts
Vendor payments
Financial aid payments
Receipts
COA-Flexfield
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Product/Commodity
COA-Project
Job Requisition
Job Classification
Job Applicant
COA-Program
COA-Account
x
COA-Business unit
Payments
Fund
x
x
Facility
Page 4
Description of dimensions:
Following are definitions of the dimensions listed in the matrix. (in the following a dimension name is in boldface; related
sub-dimensions in italic).
Date.
The date dimension provides a coordinated calendar for the university, relating individual dates to:
o the academic calendar, including semester and summer session start and end, census dates, etc.
o the financial calendar, including fiscal periods, processing cut-off dates, etc.
A related sub-dimension is the set of possible class meeting schedules, course meeting schedule.
Facility.
Descriptive information about rooms and buildings.
Course.
This dimension describes all the courses offered by the campus as well as their individual sections, along with all of the
relationships among them. Course offerings are linked to the organizations which offer them and to the major courses of study
to which they contribute.
Student
Describes demographic and academic characteristics of anyone who has ever been admitted to the university. An important
set of descriptive attributes of students are academic status information, such as registration status, residency, degree goals and
major, together form a subdimension with information which is refreshed once or more during an academic term for
continuing students. This more detailed subdimension is identified as:
Student academic status. (This is largely the information collected under the title semester event in the student data
warehouse pilot).
Page 5
External institution.
Information about all the secondary schools and other higher educations which interact with the campus, including all high
schools and junior colleges which prepare undergraduate applicants as well as other higher-education institutions. Closely
related is the subdimension.
o Testing agency. Identification and other information about organizations who furnish standardized test results, such as
SAT scores, to the campus.
Research sponsor.
Descriptive information about organizations such as government agencies, foundations and corporations, which sponsor
research.
Employee
Identifying and descriptive information, such as demographic information, about present and past employees of the campus,
including faculty, staff and all manner of affiliates, such as volunteer contributors.
Appointment.
Facts about an employee job, including such information as title, job code, reporting relationship, leave status. Because it is
common for employees, particularly faculty, to have several appointments, some key information about employment status,
salary grade, etc., is only accurate by appointment. The appointment dimension obviously needs to be coordinated with the
employee dimension, as changes to the overall employment status of an employee cascade to that employees various
appointments. Appointments for permanently-budgeted positions are linked to the budget dimension Budgeted staff
position.
Page 6
Job Applicant.
Demographic and other information about persons who have applied for employment on the campus.
Job Classification.
Information about defined jobs, including relationships among them, such as job families and progression ladders.
Job Requisition.
Information about individual job requisitions including coded skills requirements, dates, etc.
Employee training.
Information about campus training programs, including specific training offerings, related programs and training hierarchies.
Supporter.
Descriptive information about donors and potential donors (prospect) to the campus, including identifying and demographic
information, and fund-raising attributes. Closely related to this dimension is the following subdimension:
o Supporter student interest
Describes the relationship of a supporter to an individual student; e.g., parent of the student, along with other related
descriptive information.
Campaign.
Descriptive information about organized fund-raising programs.
Fund.
Information about sources of funding for the campus, including BFS funds and fund hierarchies. Includes linkages when
appropriate to Sponsors and Supporters.
Page 7
Organization.
Information about campus organizations. Includes all internal organizations, including cross-functional organizations, and
their interrelationships and reporting hierarchies.
COA-Account.
Information about GL accounts and account hierarchies, including information needed to reconcile BFS accounts with UCOP
accounts.
COA-Program.
Identified programs from BFS, along with any hierarchies or inter-program relationships.
COA-Project.
Information about projects used to identify financial information, including descriptions and relationships among those
projects.
COA-Flexfield.
Any descriptive information about flexfields used to classify financial information. This is probably a slight dimension, since
information about flexfields is local to individual organizations and may not be well documented/
Vendor.
Descriptive information about partiesorganizations and individualspaid through the campus payments system, including
hierarchical relationships among them. Many employees are vendors.
Product/commodity.
Description of the products and services bought by the university through the purchasing system.
Service element.
This is a place holder for a family of dimensions describing the service elements for which recharge organizations charge, such
as telecommunications services. There is one or more such dimension for each recharge organization and no expectation that
they are aggregated into a single linked dimension.
Page 8
The following dimensions describe persons who do not necessarily have UIDs, student id or employee ids.
Job applicant
Supporter.
Describes a party (person or organization) identified as an actual or potential supporter of the campus, including
alumni. Most supporters are persons but supporters which are organizations represent an important source of
donations. Supporters often are related to students; in the most frequent case by having been Cal students.
2. Campus organization.
Includes all internal organizations, including cross-functional organizations, and their interrelationships and reporting
hierarchies. Several degree programs and majors represent joint offerings of several academic organizations and are not
themselves organizations.
Last saved on 9/13/06
Page 9
Sponsor.
Secondary school.
Vendor
4. Date.
The date dimension provides a coordinated calendar for the university, relating individual dates to:
the academic calendar, including semester and summer session start and end, census dates, etc.
the financial calendar, including fiscal periods, processing cut-off dates, etc.
Because of university practices of making some important changes retroactively and because of requirements for reporting to
UCOP and some external stakeholders according to census dates, many data marts have two links to the date dimension:
effective date (the date as of which information is considered to be final)
reported date (the date information was available to be reported externally)
In addition, many time-varying dimensions, such as student and employee, will have different values for different
combinations of effective and reported date.
For example, consider the following example of course enrollments:
In Fall 2005 there is a course and section AA11. At the beginning of the term, September 7, 2005, it has two student
enrollments, for these two students:
student 123 who is a resident graduate student
student 456, who is a non-resident graduate student.
On October 1, student 456 is granted resident status, retroactive to the beginning of the semester.
On November 1, student 456 drops the class.
Page 10
A count of course enrollments by residency as reported on date n should return three different values for n = 9/10, 10/10 and
11/10, as follows:
Date reported
9/10
10/10
11/10
Today
Resident enrollments
1
2
1
1
Non-resident enrollments
1
0
0
0
5. Fund.
One of the most-highly shared dimensions is Fund, which serves to associate activity, such as research spending, employment,
other spending, and financial aid payments, with funding sources and fund-producing activities. For example, funds associate
research spending with sponsors, research proposals and grants. Likewise funds associate supporters and gifts with the
financial aid payments and other spending which represent the use of grants.
Page 11
The architecture effort examined these four important components of the data
warehouse:
1.
2.
3.
4.
Applications
Data
Technology and security
Supportprocesses and organization
BAIRS
BIS
Cal Profiles
The pilot student data warehouse
FASDI
The Office of Student Research database/ reporting file
Architecture Requirements:
Starting with the functional requirements articulated in the campus interview process and
recorded in the requirements document referenced above, the architecture team
analyzed resulting architectural requirements in the four dimensions of applications,
data, security/technology and process/organization. The resulting architecture
requirements are summarized in the EDW Architecture Requirements documents, which
are available at the following link:
https://bearshare.berkeley.edu/sites/RAPO/EDW/DWStrat/Architecture/Document
Library1/Forms/AllItems.aspx
Based on those requirements, the team analyzed existing warehouse capabilities, with
particular emphasis on BAIRS, which is the platform which will serve as the basic
foundation from which to build the EDW. Requirement gaps were identified and
described for each of the components of the EDW; they are documented in individual
documents at the architecture website:
https://bearshare.berkeley.edu/sites/RAPO/EDW/DWStrat/Architecture/Document
Library1/Forms/AllItems.aspx
Updated 4/14/06
Page 2
Architecture Gaps:
1. Applications gaps.
Updated 4/14/06
Page 3
Standard reporting and query support for subject areas other than G/L,
budgeting, human resources.
The data warehouse will grow by adding new data subject areas, such as
teaching and course enrollments, admissions and student registration, donors
and gifts, purchases and recharges. Along with each new data increment, the
warehouse should add new standard reports and new facilities, such as view
layers and drill-downs, to make that data useful.
Updated 4/14/06
Page 4
Reporting refinements.
Some users need better facilities for scheduling routine reports. Some need
improvements in the way the reporting tool prints reports.
2. Data gaps.
Updated 4/14/06
Page 5
Incorporation of plan data (e.g. salary and budget projections) into the data
warehouse.
Budget tracking is one of the most widely-used applications of the data
warehouse. Reporting current status against budget requires summarizing
actual expenses along with known commitments and spending plans. The
current budget warehouse, BAIRS, contains some but not all of the information
about known future expenses. BAIRS budget reports include purchasing
commitments (requisitions and purchases) but dont include other projected
expenses. Users would like to see foreseeable expenses, such as large
software license costs or travel expenses, along with other commitments and
actual expenses in order to know where they stand with budget.
This is a particularly important need in tracking budgets for research grants,
which often have multi-year budgets. In particular, its important to include
projected salary costs along with actual expenses and commitments to
understand current budget status of research grants.
Adding projections and salary encumbrances to the data warehouse depends on
first creating an application which allows users to record them. This will probably
be most easily developed as a custom function within the Berkeley Financial
System (BFS).
Updated 4/14/06
Page 6
Security enhancements.
New data subject areas, such as student and employee data, have somewhat
more complex requirements for controlling access. In particular, these subject
areas contain much sensitive personally-identifiable data, highly regulated by
government statutes and University policy. Meeting the requirements of these
subject areas will require an expanded access-control system.
o
Updated 4/14/06
Page 7
OLAP functionality.
As mentioned in Application gaps, above, some users who need to analyze large
data sets rapidly along various dimensions will benefit from using a special OLAP
database management product. Users such as budget analysts and labor
negotiators have these needs. In addition, many such users need to be able to
build models created by manipulating data from the warehouse. These needs,
too, are met by OLAP technology. Some key requirements include the following:
o
Updated 4/14/06
Page 8
4. Support/Process/Organization gaps.
Some of the biggest, most important gaps between the EDW specified by campus
interviews and the existing data warehouse involve organization, skills and
processes. Most interview subjects commented on the need for changes in this
dimension of the warehouse. The EDW serves the whole campusacademic units
and administrationand delivers information drawn from across the activities of the
campus. As a result, it needs cross-campus leadership, processes which effectively
draw out the collective institutional knowledge of the campus, and a skilled,
professional warehouse team. Following are some specific gaps:
EDW governance.
The EDW is a multi-year investment which needs to be directed and paced by
overall campus priorities. It is important that the executive leadership of the
campus, including both academic and administrative leadership, participate in
setting priorities and success criteria for the effort. The same leaders must
ensure that the warehouse effort has a budget appropriate to those priorities. A
high-level executive steering committee is required, as well as a delegated
executive owner. This steering group must charter the work of the rest of the
EDW team, ensure that the team is competent to meet that mission, and
appropriate the needed funds.
In addition, the executive steering group needs to ensure that the whole campus
is participating appropriately in evolving the warehouse, either by participating in
the operational management of the warehouse effort or by supplying data, and
data knowledge and quality oversight.
BI Competency Center.
The data warehouse is a campus information tool with many components.
Delivering it requires a blend of campus business knowledge, management and
technical skills. These skills are most effective when collected together and
managed as a single organization. This collection of skills is often called a
Business Intelligence Competency Center. Whats most important is that all the
required skills are coordinated as a single set and harnessed to help the whole
spectrum of campus users get their information needs met. Having these skills
operate as a single organization helps ensure that the process is needs-driven; it
also promotes uncovering options for meeting those needs.
An effective competency center should include the following components:
o
Management.
Development.
Updated 4/14/06
Page 9
This includes the skills and personnel needed to design and implement
new releases, including functional requirements definition, technical
development and change management.
Architecture.
Planning a roadmap of projects which will make the required changes to
data, applications, security and processes in order to deliver increments
of value requires architectural understanding and design of the whole
program.
Release planning.
Implementing a new release requires change management which begins
with identifying new user populations and their needs and progresses
through measurement of the value of new capabilities. In addition,
substantial project management is required to coordinate the change
management along with the development of technical components, user
training and development of associated documentation and metadata.
This is a different process from the day-to-day management of the
operating warehouse.
Updated 4/14/06
Page 10
Updated 4/14/06
Page 11