P. 1
Data warehouse

Data warehouse

|Views: 40|Likes:
Published by Utkarsh Srivastava

More info:

Published by: Utkarsh Srivastava on Nov 01, 2011
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as DOC, PDF, TXT or read online from Scribd
See more
See less





Data warehouse

From Wikipedia, the free encyclopedia

Data Warehouse Overview

In computing, a data warehouse (DW) is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting. A data warehouse maintains its functions in three layers: staging, integration, and access. Staging is used to store raw data for use by developers. Theintegration layer is used to integrate data and to have a level of abstraction from users. The access layer is for getting data out for users. This definition of the data warehouse focuses on data storage. The main source of the data is cleaned, transformed, catalogued and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support (Marakas & O'Brien 2009). However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.
Benefits of a data warehouse
A data warehouse maintains a copy of information from the source transaction systems. This architectural complexity provides the opportunity to:

       

Maintain data history, even if the source transaction systems do not. Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the

organization has grown by merger. Improve data, by providing consistent codes and descriptions, flagging or even fixing bad data. Present the organization's information consistently. Provide a single common data model for all data of interest regardless of the data's source. Restructure the data so that it makes sense to the business users. Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems. Add value to operational business applications, notably customer relationship management (CRM) systems.

[edit]History The concept of data warehousing dates back to the late 1980s [1] when IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse".[2] 1970s — Bill Inmon begins to define and discuss the term: Data Warehouse 1975 — Sperry Univac Introduce MAPPER (MAintain. For example. believe in Ralph Kimball’s approach in which it is stated that the data warehouse should be modeled using a Dimensional Model/star schema. an enormous amount of redundancy was required to support multiple decision support environments. the operational systems were frequently reexamined as new decision support requirements emerged. products. It should be noted that both normalized – and dimensional models can be represented in entity-relationship diagrams as both contain jointed relational tables. In a dimensional approach. releases Data Interpretation System (DIS). they often required much of the same stored data. a for-profit organization that promotes data warehousing. the retrieval of data from the data warehouse tends to operate very quickly. 1990 — Red Brick Systems. 1992 — Bill Inmon publishes the book Building the Data Warehouse. order ship-to and bill-to locations. It is difficult to modify the data warehouse structure if the organization adopting the dimensional approach changes the way in which it does In the normalized approach.. In essence.). database normalization rules.[2] 1970s — ACNielsen and IRI provide dimensional data marts for retail sales. and 2. Ralph 2008). in a joint research project. usually from long-term existing operational systems (usually referred to as legacy systems). product number. believe in Bill Inmon's approach in which it is stated that the data warehouse should be modeled using an E-R model/normalized model. First used at the TSB England & Wales 1984 — Metaphor Computer Systems. because of the number of tables involved. DIS was a hardware/software package and GUI for business users to create a database management and analytic system. which are generally numeric transaction data. which whilst not being a true DW in the Inmon sense. The difference between the two models is the degree of normalization. business. founded by Ralph Kimball. cleaning and integrating new data from "data marts" that were tailored for ready access by users. Often new requirements necessitated gathering. A disadvantage of this approach is that. founded by Bill Inmon. it can be difficult for users both to: 1. The concept attempted to address the various problems associated with this flow.g. Moreover. the data in the data warehouse are stored following. The main advantage of this approach is that it is straightforward to add information into the database. and salesperson responsible for receiving the order. whose supporters are referred to as “Kimballites”. transaction data are partitioned into either "facts". a database management system specifically for data warehousing. is founded. 1983 — Sperry Corporation Martyn Richard Jones defines the Sperry Information Center approach. did contain many of the characteristics of DW structures and process as defined previously by Inmon. The normalized structure divides data into entities.[4] 2000 — Daniel Linstedt releases the Data Vault. Tables are grouped together by subject areas that reflect general data categories (e. also called the 3NF model. Furthermore. 2. 1996 — Ralph Kimball publishes the book The Data Warehouse Toolkit. data on customers. and Produce Executive Reports) is a database management and reporting system that includes the world's first 4GL. The normalized approach. The process of gathering. Dimensional structures are easy to understand for business users. . develop the terms dimensions and facts. 1991 — Prism Solutions. Though each environment served different users. a sales transaction can be broken up into facts such as the number of products ordered and the price paid for the products. to a degree. Prepare. 1988 — Barry Devlin and Paul Murphy publish the article An architecture for a business and information systems in IBM Systems Journal where they introduce the term "business data warehouse". Ralph 2008). whose supporters are referred to as “Inmonites”. In larger corporations it was typical for multiple decision support environments to operate independently. enabling real time auditable Data Warehouses warehouse. Also. was typically in part replicated for each environment. which creates several tables in a relational database. Facts are related to the organization’s business processes and operational system whereas the dimensions surrounding them contain context about the measurement (Kimball. finance. In order to maintain the integrity of facts and dimensions. software for developing a data warehouse. The main disadvantages of the dimensional approach are: 1. etc. [edit]Normalized versus dimensional approach for storage of data There are two leading approaches to storing data in a data warehouse — the dimensional approach and the normalized approach. join data from different sources into meaningful information and then access the information without a precise understanding of the sources of data and of the data structure of the data warehouse. customer name. each of the created entities is converted into separate physical tables when the database is implemented (Kimball. Key developments in early years of data warehousing were:               1960s — General Mills and Dartmouth College. When applied in large enterprises the result is dozens of tables that are linked together by a web of joins. A key advantage of a dimensional approach is that the data warehouse is easier for the user to understand and to use. and later by Devlin. introduces Prism Warehouse Manager. cleaning and integrating data from various sources. mainly the high costs associated with it. or "dimensions". because the structure is divided into measurements/facts and context/dimensions. The dimensional approach. introduces Red Brick Warehouse. In the absence of a data warehousing architecture. which are the reference information that gives context to the facts. founded by David Liddle and Don Massaro. and into dimensions such as order date. loading the data warehouse with data from different operational systems is complicated.[3] 1995 — The Data Warehousing Institute. the data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to decision support environments. It was the first platform specifically designed for building Information Centers (a forerunner of contemporary Enterprise Data Warehousing platforms) 1983 — Teradata introduces a database management system specifically designed for decision support.

dimensions and facts. in this example either "Sales" or "Production" information.[7] [edit]Hybrid design Data warehouse (DW) solutions often resemble hub and spoke architecture. Top-down design has also proven to be robust against business changes.These approaches are not mutually exclusive. rather than a big and often complex centralized model. The business might then decide to expand the warehousing activities into the. An integration (possibly) achieved in a flexible and iterative fashion. a collection of conformed dimensions and conformed facts. The most important management task is making sure dimensions among data marts are consistent. a top-down architecture with a bottom up design. which are dimensions that are shared (in a specific way) between facts in two or more data marts. summarized data. is of critical business value. that is. and there are other approaches. through its two data marts. specific data for the fact tables and dimensions required. this means that the dimensions "conform". Note that this does not require 100% awareness from the onset of the data warehousing effort. Ralph 2008). Generating new dimensional data marts against the data stored in the data warehouse is a relatively simple task. It is not geared to be end-user accessible. are stored in the data warehouse. that will be. In Kimball's words. generating large amounts of data. It is important to note that the DW database in a hybrid solution is kept on third normal form to eliminate data redundancy. To reduce data redundancy. still requires the use of a data mart or star schema based release area for business purposes. Some consider it an advantage of the Kimball method. The information from the ODS is then parsed into the actual DW. The main disadvantage to the top-down methodology is that it represents a very large project with a very broad scope. The Data Vault model is not a true 3rd normal form. in which the data warehouse is designed using a normalized enterprise data model. Facts can contain either atomic data and. say. Upon completion of the Sales-data mart. but can deliver integrated Sales-Production information. if necessary. and the duration of time from the start of project to the point that end users experience initial benefits can be substantial. by building a Sales-data mart.[5] is a proponent of an approach to data warehouse design which he describes as bottom-up. creating a highly flexible solution from a BI point of view. [edit]Top-down design Bill Inmon. Non-volatile Data in the data warehouse are never over-written or deleted — once committed. will not only be able to deliver the specific information that the individual data marts are designed to do. The hybrid architecture allows a DW to be replaced with a master data management solution where operational. the top-down methodology can be inflexible and unresponsive to changing departmental needs during the implementation phases. the data warehousing effort might start in the "Sales" department. and breaks some of the rules that 3NF dictates be followed." These data marts can eventually be integrated to create a comprehensive data warehouse. consisting of the best of breed practices from both 3rd normal form and star schema. The requirement for the Sales data mart and the Production data mart to be integrable. Though it is important to note that in Kimball methodology. Business value can be returned as quickly as the first data marts can be created. the data are static. data at the lowest level of detail. read-only. has defined a data warehouse as a centralized repository for the entire enterprise. Time-variant The top-down design methodology generates highly consistent dimensional views of data across data marts since all data marts are loaded from the centralized repository. which provides a logical framework for delivering business intelligence (BI) and business management capabilities. and retained for future reporting. The integration of data marts is managed through the implementation of what Kimball calls "a data warehouse bus architecture". the data warehouse. Inmon states that the data warehouse is: Subject-oriented The data in the data warehouse is organized so that all the data elements relating to the same real-world event or object are linked together. The Data Vault model is geared to be strictly a data warehouse. The actual integration of two or more data marts is then done by a process known as "Drill across". "Atomic" data. If integration via the bus is achieved.[6] In the bottom-up approach data marts are first created to provide reporting and analytical capabilities for specific business processes. is not efficient for business intelligence reports where dimensional modelling is prevalent. Dimensional approaches can involve normalizing data to a degree (Kimball. and facilitate the extract transform load (ETL) process. not static information could reside. often. A drill-across works by grouping (summarizing) the data along the keys of the (shared) conformed dimensions of each fact participating in the "drill across" followed by a join on the keys of these grouped (summarized) facts. Maintaining tight management over the data warehouse bus architecture is fundamental to maintaining the integrity of the data warehouse. Legacy systems feeding the DW/BI solution often include customer relationship management (CRM) and enterprise resource planning solutions (ERP).[7] The data warehouse bus architecture is primarily an implementation of "the bus". The single data mart often models a specific business area such as "Sales" or "Production. The Sales-data mart is good as it is (assuming that the bus is complete) and the production data mart can be constructed virtually independent of the sales data mart (but not independent of the Bus). primarily. Dimensional data marts containing data needed for specific business processes or specific departments are created from the data warehouse. no master plan is required upfront. DW solutions often make use of an operational data store (ODS). the bottom-up process is the result of an initial business oriented Top-down analysis of the relevant business processes to be modelled. which when built. It is however. that the data warehousing team has made the effort to identify and implement the conformed dimensions in the bus. The DW effectively provides a single source of information from which the data marts can read. that the data warehouse ends up being "segmented" into a number of logically self contained (up to and including The Bus) and consistent data marts. and that the individual data marts links that information from the bus. is that they share the same "Bus". Data marts for specific reports can then be built on top of the DW solution. larger systems will often store the data in a normalized way. The up-front cost for implementing a data warehouse using the top-down methodology is significant.[7] Inmon is one of the leading proponents of thetop-down approach to data warehouse design. To consolidate these various data models. The integration of the data marts in the data warehouse is centered on the conformed dimensions (residing in "the bus") that define the possible integration "points" between data marts. and the method gives itself well to an exploratory and iterative approach to building data warehouses. The Data Vault Modeling components follow hub and spoke architecture. [edit]Top-down [edit]Bottom-up versus bottom-up design methodologies design Ralph Kimball. "Production department" resulting in a Production data mart. A normal relational database however. For example. In addition. which. . Data marts contain. Integrated The data warehouse contains data from most or all of an organization's operational systems and these data are made consistent. one of the first authors on the subject of data warehousing. This modeling style is a hybrid design. Small data marts can shop for data from the consolidated warehouse and use the filtered. a well-known author on data warehousing. In the Inmon vision the data warehouse is at the center of the "Corporate Information Factory" (CIF).

Relational databases are efficient at managing the relationships between these tables. By 2012. more than 35 percent of the top 5. one-third of analytic applications applied to business processes will be delivered through coarse-grained application mashups. business units will control at least 40 percent of the total budget for business intelligence.000 global companies will regularly fail to make insightful decisions about significant changes in their business and markets.[11]      Because of lack of information. Operational system designers generally follow the Codd rules of database normalization in order to ensure data integrity. and tools. On time data warehouse Online Integrated Data Warehousing represent the real time Data warehouses stage data in the warehouse is updated for every transaction performed on the source data Integrated data warehouse These data warehouses assemble data from different areas of business.[10] A 2009 Gartner paper predicted these developments in business intelligence/data warehousing market. Also. in order to improve performance. weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data Offline data warehouse Data warehouses at this stage are updated from data in the operational systems on a regular basis and the data warehouse data are stored in a data structure designed to facilitate reporting. older data are usually periodically purged from operational systems. Data warehouses are optimized for speed of data analysis. . has a history of innovations that did not receive market acceptance. data warehouse data are often stored multiple times—in their most granular form and in summarized forms called aggregates. Fully normalized database designs (that is. The databases have very fast insert/update performance because only a small amount of data in those tables is affected each time a transaction is processed. those satisfying all five Codd rules) often result in information from a business transaction being stored in dozens to hundreds of tables. Data warehouse data are gathered from the operational systems and held in the data warehouse even after the data has been purged from the operational systems.[8] Sample applications Some of the applications data warehousing can be used for are:         Decision support Trend analysis Financial forecasting Churn Prediction for Telecom subscribers. By 2010. Credit Card users etc. By 2012. Codd defined five increasingly stringent rules of normalization. Frequently data in data warehouses are denormalised via a dimension-based model. like any technology. processes. to speed data retrieval. so users can look up the information they need across other systems. 20 percent of organizations will have an industry-specific analytic application delivered via software as a service as a standard component of their business intelligence portfolio. Insurance fraud analysis Call record analysis Logistics and Inventory management Agriculture [9] [edit]Future Data warehousing. In 2009. collaborative decision making will emerge as a new product category that combines social software with business intelligence platform capabilities. through 2012. Finally.[edit]Data warehouses versus operational systems Operational systems are optimized for preservation of data integrity and speed of recording of business transactions through use of database normalization and an entity-relationship model. [edit]Evolution in organization use These terms refer to the level of sophistication of a data warehouse: Offline operational data warehouse Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily.

This is a real valueadd of the Project Manager. For the technical personnel (application programmer. • Matching team member’s skills and aspirations as closely as possible to tasks on the plan. • Applying personal skill and judgment to everything on the project. database administrator. They will do this by culling resources from within the data warehouse team and from consultancy as necessary and establishing partnerships with other internal support organizations required to support a data warehouse iteration. Management also needs to make actionable plans out of these directives and make sure the staff executes on them. data administrator). • Tracking all relevant metrics for each iteration: o Project Plan milestones . A Project Manager delivers by: • Maintaining a highly detailed plan and obsessively caring about the progress on it. system administrator. Project Managers must deliver commitments and must deliver on time.Numerous roles and responsibilities will need to be acceded to in order to make data warehouse efforts successful and generate return on investment. it is recommended that the following roles be performed full-time by dedicated personnel as much as possible and that each responsible person receive specific Data Warehouse training. Manager/Director The responsibilities of the Manager or Director of the Data Warehouse team should be to: • Ensure support for the data warehouse program at the highest levels of the organization • Understand high level requirements of the business • Present the business with the possibilities available to them through data warehousing • Staff the team • Establish and ensure adherence to a set of guiding principles for data warehousing • Communication of key milestone status to IT management • Ensuring the remainder of the team accede to their responsibilities as enumerated below • Liaise with strategic vendors • Establishing partnerships with key IT partners in support of data warehousing initiatives Project Manager The perceived strength of data warehousing within an organization will be the sum of the strength of the Project Managers. It is the Project Manager’s job to exercise relevant discretion. The following are team and extended team member composition and roles and responsibilities suggestions. The data warehouse team needs to lead the organization into assuming their roles and thereby bringing about a partnership with the business.

Data Steward (in user community . For the core subject areas.by subject area) Data Stewardship appointment should be made at the subject area to management-level personnel in the business areas most impacted by the subject area. both current and emerging technologies. His/her knowledge of the business needs to be just as great. Participation on a decision support steering committee 8. The Data Steward is responsible for the transformation rules used in the process of moving data from source to target for sourced data. Arbitrating the transformation rules 2. The person should be able to quickly qualify as an authority in data warehousing within the organization and have mastery of the data warehousing paradigm. this will be the person on the working team by appointment of the executive sponsor. modeling. A business technologist. This person would have significant interface with the internal clients and increase their confidence in the data warehouse organization. to work on complex issues of architecture.o Issues list o Adherence to change control practices o Adherence to source code control practices o Documentation fit for users and support personnel o Architectural components adherence to fit for purpose and standards o Regression testing performed and tests updated based on changes o Team members fit for tasks and career-enhanced Chief Architect The Manager/Director will need to rely on a Chief Architect (or similar title) position. Other users have read-only access to the Data Warehouse areas that they have been approved to read by the data stewards. have read and write access to their area of stewardship. The responsibilities of a Data Steward include: 1. Verifying the data after load 4. Entering data or determining data population method 3. Supporting the user community on the data 7. s/he will meet business objectives with existing or emerging technologies and work on issues with broad technical or strategic implications. as one of his/her direct reports. Contribution of the business metadata 5. Data Quality A brief description of each of these responsibilities follows. unlike other users. and tools. The specific tasks are: . Approving new users 6. Data stewards.

• Assuring transformation rules keep the data meaningful and consistent • Arbitrate differences of opinion an different interpretations of value as to how the data will be represented in the Data Warehouse • Make the call on the initial rules and all subsequent changes The Data Steward is responsible for the timely and accurate population of data in the area of stewardship. The business metadata consists of the business definition written in terms the business can understand. Inc. This could mean manual entry through the developed user interface or the selection of the system that will provide the data (ie. The specific tasks are: • Determine the systems that will feed the data to the data warehouse • Ensure feeds are developed to feed to the data into the data warehouse as soon as it is practical to do so on a regular basis ©McKnight Associates. The Data Steward will broker requests for new usage of the Data Warehouse. The "source"). The specific tasks are: • Knowledge of the data sources. The specific tasks for the data steward for this is: • Participating in the regular training sessions given to new users • Help-desk style support of the user community on a regular basis Meetings of the decision support steering committee should be scheduled regularly to make strategic discernment and prioritization over the major additions of usage. and uses of the data for the area of stewardship • Knowledge of the workload limitations of the Data Warehouse system for the area of stewardship • Approve new users and their authority levels (usually read only) It should be a requirement for new users of the Data Warehouse to undergo training internally on both the data model and the data access tool. as the responsible party for the quality of the data. subject areas. will verify the data after it is loaded from the operational sources. This is perhaps the most important of the Data Steward's tasks. The Data Steward will be responsible for the training and ongoing support of the user community on the data model and the data itself. and data sources to the Data . transformation rules. 2000 3 The Data Steward. The specific tasks are: • Confirm the data was loaded and that the transformational rules were properly applied • Formally give the approval to the greater user community for query and analysis of the data The Data Steward will contribute the business metadata to the metadata repository.

Nonetheless. there are responsibilities that an end user must meet to get maximum benefit from the Data Warehouse. The End User should find most of his or her data. The Executive Sponsor must be politically viable and be able to garner and retain adequate resources for the construction and maintenance of the Data Warehouse. The metadata will consist of tables containing data about the user data. systemic methods.Warehouse. Attending training before receiving an ID and password on the Data Warehouse system 2. To get to that point. and the data model. accuracy. Inc. The responsibilities of an End User include: 1. 2000 4 The Data Warehouse will potentially disrupt but should ultimately increase the organization’s ability to generate sales and lower costs. End User The Data Warehouse is being built for End User query and reporting. since it should be a user-centric Data Warehouse. Unattended perceptions of uncleanliness quickly tear down the value of the Data Warehouse. requires that the users actively use the Data Warehouse as instructed and. The data steward’s area of stewardship must receive data according to recognized. The specific tasks for the Data Steward for the committee is: • Meeting attendance and leadership • Judgment on the issues presented The consistency. Browsing the technical and business metadata for information on data sourcing and data definitions 3.related questions answered in the metadata. Data quality issues must be addressed immediately. actively provide feedback to the Data Warehouse team on issues of: • Performance • Functionality • Data quality and completeness • Data sources • Metadata quality and completeness Executive Sponsor (by data warehouse iteration) The Data Warehouse must have high-level and sustainable sponsorship. . however. the query tool. and timeliness of the data are the responsibility of the data steward. ©McKnight Associates. Providing feedback to the Data Warehouse team The End User must attend training to familiarize himself or herself with the Data Warehouse environment especially the metadata.

The responsibilities of a Database Administrator include: 1. and data sources to the Data Warehouse. Performance Monitoring and Summary table creation . Data Replication 6. Frequent status should be delivered to upper level executives. Knowledge of the organization’s Data Warehouse environment 4. subject areas. The Executive Sponsor. IS. Repository Management 7. Database maintenance 3. Database loading 9. Keeping the Data Warehouse out of internal cross-fire The Executive Sponsor. It is imperative to the success of the Data Warehouse that the Executive Sponsor takes leadership and provides vision to the committee. Specific areas of focus: • Data Sources • Subject Areas • Data Stewardship • High-level Architecture • Project Plan and Budget The Data Warehouse. to be successful. Knowledge of Data Warehouse systems 2.The responsibilities of the Executive Sponsor include: 1. Database Administrator A key design point for a data warehouse group is the placement of the database administration function and the division of roles and responsibilities between the support group and the user community. Leadership in the Decision support steering committee meetings 3. should be able to articulate and quantify the value of Data Warehouse project and its many elements. as the responsible executive for the success or failure of the iteration of the Data Warehouse. as the responsible executive for the success or failure of the Data Warehouse. Participation on the Decision support steering committee 5. the executive sponsor must keep the focus at the positive return-on-investment potential and realization of the Data Warehouse. Security administration 8. must not fall victim to infighting. should be able to articulate and quantify the value of the Data Warehouse program. Physical database design 2. and the Executive Sponsor to make strategic discernment and prioritization over the major additions of usage. Backup and recovery 4. To maintain this countenance. The Decision support steering committee is designed as a forum for data stewards.

views. Specific tasks include: • Ensuring accurate and complete replication of applicable data during the available windows • Setup and administration of Oracle Replication Services or other relevant replication tool • Collision detection to avoid loss of data • Input to the Worldwide Data Architecture process based on replication functionality ©McKnight Associates. The Data Warehouse Database Administrator will translate the logical database design into a physically implementable design. and defaults for each column • Creating the databases and database objects such as the tables. some of the entities as domains rather than tables • The addition of timestamp and user ID columns to most tables to identify the person and the time of the row addition or change • The use of standard abbreviations in the table and column names • Assignment of data types. nullability. and indexes The Data Warehouse Database Administrator will maintain the tables as necessary for optimization. process. If the Data Warehouse becomes as a distributed system. The specific tasks may include: • Periodic table reorganization • Structure update with change control The Data Warehouse Database Administrator will ensure recoverability of the Data Warehouse with minimal loss of information. synonyms. The Data Warehouse Database Administrator will be responsible for the synchronization.10. reporting. a synchronization effort may be needed. Inc. 2000 6 . triggers. The specific tasks are: • Denormalizing the models based on potential queries. tablespaces. stored procedures. Ad-hoc data manipulation A brief description of each of these responsibilities follows. Specific tasks include: • Extensive testing of the backup and recovery processes • Ability to recover databases within the service level agreement • Ability to articulate any data loss encountered • Ensure logging and recoverability of database updates since the last backup • Take frequent backups – full and incremental as needed • Frequent offsite storage of the latest full backup in the event of a natural disaster Representation from the Data Warehouse Database Administration staff will participate on the Decision support steering committee. domains. as necessary. or replication. and system feeds that will be generated from the tables • Implement. defaults.

the Data Warehouse Database Administrator will. the Data Warehouse Database Administrator will perform these queries. following the data steward review of the data load. .The Data Warehouse Database Administrator will be responsible for the metadata management infrastructure. Tasks will include: • Database maintenance for the database(s) • Installation of the repository software. As the in-house SQL expert. code and install the programs to load the data into the Data Warehouse. update. The Data Warehouse Database Administrator is the point person for issues of performance within the Data Warehouse environment. if applicable • Front-end development • User training Only users approved by the data stewards will be granted security privileges on the Data Warehouse. there may be reconciliation queries to be run to bring the data to a level of consistency. The Data Warehouse Data Administrator will translate the user requirements into a logical database design. The Data Warehouse Database Administrator will execute the grants or provide the ability for a central security group to do this. Specific tasks include: • Creating security profiles • Assuring new users are approved by the data steward(s) • Assuring new users have taken training • Granting privileges to the data stewards For data that is sourced for the Data Warehouse. Data Administrator A key design point for a data warehouse group is the placement of the data administration function and the division of roles and responsibilities between the DBAs and the data administrators. Specific tasks follow: • Being first on call for performance issues from the user community • Setting up the governors on the database systems to stop runaway queries • Assist the user community in writing well-performing SQL • Route system and network performance issues to the responsible party • Spotting trends in queries and creating and maintaining summary tables for lowperforming queries where possible • Making the end user query tool aware of the summary tables Occasionally. Specifically. or delete statements against the database. if necessary. the most expedient way to create or restore consistency to the data in the Data Warehouse is to write insert.

Preparing a database-loadable file for the Data Warehouse 4. test. Applying the business transformation rules 3. Participation on the Decision support steering committee ©McKnight Associates. The Data Warehouse Application Programmer will extract data destined for the Data Warehouse from the operational systems that store the data. Specific duties include: • Obtaining complete knowledge of the physical database schema • Preparing the files needed to load each table that has been designated to receive files from operational systems rather than direct input from data stewards • Programming the data acquisition tool(s) with the tables to load the files into • Working with the Data Warehouse Database Administrator to ensure the file loads properly into the Data Warehouse . Inc.The responsibilities of a Data Administrator include: 1. the file(s) will need to be made ready for loading into the Data Warehouse. implement and maintain any data extraction programs necessary to extract the data from the operational systems needed to be moved to the Data Warehouse The Data Warehouse Application Programmer is responsible for applying transformation rules as necessary to keep the data clean and consistent and therefore usable by the user community. Management of the deployment of the data acquisition tool(s) 5. 2000 7 Application Programmer – ETL Specialist The responsibilities of an Application Programmer – ETL Specialist include: 1. Contributing the technical metadata to the metadata repository A brief description of each of these responsibilities follows. Sourcing the data from the operational systems 2. Specific tasks include: • Participation in design sessions chaired by data stewards and/or IT personnel where decisions are made involving the transformation from source to target • Programming the data acquisition tool with the rules to be applied to the data • Ensuring the correct application of the business rules through data query after the data is loaded into the Data Warehouse Following the data extraction and rules application. Gathering business data requirements 3. Logical data modeling for the data warehouse (sans the data marts) 2. This is the responsibility of the Data Warehouse Application Programmer. The specific tasks are: • Work with the source system analysts to understand the windows available for data extraction • Program.

Monitoring network performance 4. • Keeping a version of the OLAP Tool in production that is under support by the tool vendor • Installation of any client-side software needed on approved users desktops (this could also be the PC support group) • Maintaining a high-level view of the OLAP environment and ensuring fit-for-purpose tool usage at the organization • Ensuring access to business and technical metadata from within the OLAP tool The OLAP Specialist is responsible for the identification and enablement of the user community to access the data in a manner consistent with their business goals. Data warehouse architecture (in partnership with the Chief Architect) 3. Monitoring DASD utilization A brief description of each of these responsibilities follows. Inc.The metadata repository will contain technical metadata such as data sources and transformation rules applied. The Data Warehouse System Administrator will be responsible for data destined for the Data Warehouse from the operational systems that store the data. and user support of the data access tools. Profiling the user community early in the data warehouse iteration 2. Installing and maintaining the Database Management System (DBMS) 2. Obtaining complete knowledge of the physical database schema 3. The OLAP Specialist is responsible for the management of the deployment. Training and supporting users on tool usage 4. and user support of the data access tool(s) 2. Specific duties include: 1. Specific duties include: • Working with the System Administration staff to most effectively place server components in the architecture. maintenance. Identification and enablement of the user community to access the data in a manner consistent with their business goals ©McKnight Associates. Identification and scheduling of frequently needed reports System Administrator (usually by SLA with the System Administration group) The responsibilities of a System Administrator include: 1. The Application Programmer will be responsible for the entry and maintenance of this information. OLAP Specialist The responsibilities of an OLAP Specialist include: 1. The specific tasks are: . 2000 8 A brief description of each of these responsibilities follows. Management of the deployment. maintenance.

either in response to a query or as part of a data replication or synchronization effort across the WANs and LANs. with the Chief Architect. 2000 9 The Data Warehouse System Administrator is responsible for the performance of data transfers. Specific tasks include: • Participation on a decision support steering committee in decisions on architecting a world-wide data infrastructure that will accommodate the data entry and retrieval needs of the domestic and international user constituencies • Knowledge of the network capabilities • Data architecture comprising decision points around number of data stores and the amount and timing of replication and synchronization needed to achieve the objectives of having a single version of the truth for each Data Warehouse element ©McKnight Associates. Inc. Specific tasks include: • Proactive network monitoring for performance and accuracy • Recommending and expanding capacity as needed to meet performance service levels The Data Warehouse System Administrator is responsible for ensuring enough DASD is available and efficiently managed to accommodate existing and upcoming data needs. Specific tasks include: • Proactive DASD monitoring • Recommending and expanding capacity as needed to assure availability . the placement of the Data Warehouse and data marts and the adjoining data acquisition and replication strategies.• Tuning the operating system for DBMS fit • Installing and configuring the DBMS • Maintaining the DBMS levels supported by the DBMS vendor The Data Warehouse System Administrator is responsible for architecting.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->