You are on page 1of 18

By: Srishti Warman (UM8410)

Submitted to: Dr. Hitesh Kapoor

A repository of an organization's electronically stored data. Designed to facilitate reporting and analysis Retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system.

Provides the data for data warehousing Designers determine the data that contains business value for insertion Operational data is stored in OLTP databases OLTP databases can reside in transactional software applications such as Enterprise Resource Management (ERP), Supply Chain, Point of Sale, Customer Serving Software OLTPs are design for transaction speed and accuracy Metadata ensures the sanctity and accuracy of data entering into the data lifecycle process; Data has to be in the right format and relevant. Organizations can take preventive action in reducing cost for the ETL stage by having a sound Metadata policy commonly used terminology to describe metadata is "data about data"

Extraction, transformation and cleaning (ETL) process ensures that the data passes the quality threshold ETLs are also responsible for running scheduled tasks that extract data from OLTPs

Repositories are the databases that stores active data of business value Data Warehouse modelling design is optimized for data analysis There are variants of Data Warehouses - Data Marts - ODS. Data Marts are smaller data warehouses built on departmental rather than company-wide level ODS, Operational Data Stores, come in. ODS are used to hold recent data before migration to the Data Warehouse. ODS are used to hold data that have a deeper history than OLTPs

front-end applications that business users will use to interact with data stored in the repositories Data Mining is the discovery of useful patterns in data which is used for prediction analysis and classification OLAP, Online Analytical Processing, is used to analyze historical data and slice the business information required. OLAPs are often used by marketing managers Reporting tools are used to provide reports on the data. Data are displayed to show relevancy to the business and keep track of key performance indicators (KPI). Data Visualization tools is used to display data from the data repository. Often data visualization is combined with Data Mining and OLAP tools. Data visualization can allow the user to manipulate data to show relevancy and patterns.

Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtain from the dimension modeling on a local departmental level

Inmon beliefs in creating a data warehouse on


a subject-by-subject area basis. Hence the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary. The data mart is the creation of a data warehouse's subject area.

Data warehouses are not being used to their true potential. Most companies use data ware housing for 1) validation, 2) tactical reporting, or 3) exploration. Validation - Validation is the where the user community validates with data what they already believe to be true. For example, Denver consumers buy products differently than New York City consumers. New York folks tend to purchase a candy bar on a whim (city population buying patterns), where Denver folks are less likely to do so (rural population buying patterns). This has been hypothesized for years, but empirical data shows it to be true. According to a study report about 45% of the usage of the data warehouse is validation.

Tactical Reporting-Tactical reporting is where the user community uses the data for a tactical reason. For example, salesperson Daffy Duck of Acme Corp. is going to visit customer Wylie T. Coyote and he wants to know what customer Wylie T. Coyote bought during the last year. There is no comparison of customer Wylie T. Coyote and customer Roadrunner to see if there is anything that might suggest new products to sell. About 40% of the usage of the data warehouse is for tactical reporting. Exploration - Exploration is where you search for ideas or knowledge that you did not know before. This is where data mining techniques (e.g., association, classification, genetic algorithms) and applications (e.g., market basket analysis, fraud detection) come into play. Merely only about 15% of the usage of the data warehouse is for exploration.

One consistent data store for reporting, forecasting, and analysis Easier and timely access to data Improved end-user productivity Reduced costs Scalability Flexibility Reliability Competitive advantage Trend analysis and detection Key ratio indicator measurement and tracking Drill down analysis Problem monitoring Executive analysis

Preparation may be time consuming :One of the most common data warehousing issues pertains to the need to regularly extract, clean, and load data into the system. Even though this process is fully automated, it still consumes time as does regular maintenance. Compatibility with existing systems : The use of data warehousing technology may require a company to modify the database system already in place. This could really be the foremost concern of business when adapting the model given the cost of the computer systems and software needed. Security issues :Data warehousing technology, as good as it is to implement, may contain security flaws. If the database contains sensitive information, its use may be restricted to a limited group of people and precautions will be required to insure that access is not compromised. Limited data access situations can also effect the overall utilization of the data strategy.

High costs- The use of data warehousing requires a very high value of initial costs for the set up. Limited flexibility- The data warehousing technique is profitable to only those who are well knowledged and updated with the latest technology. Hard alterations-Difficult to accommodate changes in data types and ranges, data source schema, indexes and queries

A data warehouse is a specially setup database designed to hold large amounts of data for reporting purposes. While a normal database is optimized for transactional activity (while keeping a small amount of history) a data warehouse is optimized for large scale reporting. Within a data warehouse data from several systems are typically be merged together to present a global enterprise view. Data warehouses typically keeps a very long history from several years to the entire life of the company so that very long term trends can be viewed. All data warehouses are databases, not all databases are data warehouses.