A data warehouse contains integrated data from multiple sources made available for analysis. It is subject-oriented, non-volatile, and stores historical data. Data is extracted from operational systems, cleaned, and loaded into the warehouse. The warehouse uses a relational database structure and is optimized for querying and reporting rather than transactions. Data marts containing relevant data for a department or subject area can be constructed from the larger data warehouse.
A data warehouse contains integrated data from multiple sources made available for analysis. It is subject-oriented, non-volatile, and stores historical data. Data is extracted from operational systems, cleaned, and loaded into the warehouse. The warehouse uses a relational database structure and is optimized for querying and reporting rather than transactions. Data marts containing relevant data for a department or subject area can be constructed from the larger data warehouse.
A data warehouse contains integrated data from multiple sources made available for analysis. It is subject-oriented, non-volatile, and stores historical data. Data is extracted from operational systems, cleaned, and loaded into the warehouse. The warehouse uses a relational database structure and is optimized for querying and reporting rather than transactions. Data marts containing relevant data for a department or subject area can be constructed from the larger data warehouse.
• A data warehouse is simply a single, complete and consistent store of data optained from a variety of sources and made available to end users in a way they can understand and use it in a business context. • A data warehouse is a subject oriented ,integrated, time variant and nonvolatile collection of data in support of managements decision making process. Data warehouse- subject oriented • Oriented to the major subject areas of the corporation that have been defined in the data model. • for example, for an insurance company :customer, product, transaction or activity, policy ,claim, account etc. Data warehouse-Integrated • There is no consistency in encoding, naming conventions, among different data sources. • heterogeneous data sources • when data is moved to the warehouse, it is converted. Data warehouse- nonvolatile • Operational data is regularly accessed and manipulated a record at a time and update is done to data in the operational environment. Data warehouse time variance • That time Horizon for the data warehouse is sufficiently longer than that of operational systems. • operational database: current value data Building blocks or component • Meta data -good metadata is essential to the effective operation of a data warehouse and it is used in data collection, data transformation and data access. • Meta data maps the translation of information from the operational system to the analytical system. Data marts • Data mart are smaller than data warehouses and generally contain information from a single department of a business or organisation. The current trend in data warehouseing is to develop a data warehouse with several smaller related data marts for specific kinds of queries and reports. Security • As with any information system security of data is determined by the hardware software and the procedures that created them. The reliability and authenticity of the data and information extracted from the warehouse will be a function of the reliability and authenticity of the warehouse and the various source systems. Construction • That steps in planning of data warehouse are identical to the steps for any other type of computer application. Users must be involved to determine the scope of the warehouse and what business requirements need to be met. Why a warehouse • Two approaches: • 1.Query-driven (lazy) • 2.Warehouse (Eager) • The traditional research • Query driven lazy on demand Disadvantages of query driven approach • Delay in query processing. • Slow or unavailable information sources • complex filtering and integration • inefficient and potentially expensive for frequent queries • competes with local processing at sources • has not caught on in industry The warehousing approach • Information integrated in advance • stored in warehouse for Direct. • Advantages of warehousing approach • High query performance • but not necessarily most current information • does not interfere with local processing at sources • complex queries at warehouse. Data warehouse architectures • 1. Single layer • every data element is stored once only • virtual warehouse • 2.Two layer • real time+ derived data • most commonly used approach in industry today • 3. three layered architecture • transformation of real time data to derived data really requires two steps: view level ‘particular informational needs’ • physical implementation of the data warehouse. Issues in data warehouse • Warehouse design • extraction • Wrappers, monitor • integration • cleansing and merging • warehousing specification and maintenance • optimisation.