Le c t ur e N ot e sF o rDBM S a nd Da t a M ining a nd Da t a W ar e hous i ng

Unit IV Lecture 24 Data-Warehousing Introduction Present Business Scenario Over the last 20 years, $1 trillion has been invested in new computer systems to gain competitive advantage. The vast majority of these systems have automated business processes, to make them faster, cheaper, and more responsive to the customer. Electronic point of sales (EPOS) at supermarkets, itemized billing at telecommunication companies (telcos), and mass market mailing at catalog companies are some examples of such “Operational Systems”. These systems computerized the day-to-day operations of business organizations. Some characteristics of the operational systems are as follows: ? ? ? ? ? Most organizations have a number of individual operational systems (databases, applications) On-Line Transaction Processing (OLTP) systems capture the business transactions that occur. An Operational System is a system that is used daily (perhaps constantly) to perform routine operations - part of the normal business processes. Examples: Order Entry, Purchasing, Stock/Bond trading, bank operations. Users make short term, localized business decisions based on operational data. e.g., "Can I fill this order based on the current units in inventory?"

Presently almost all businesses have operational systems and these systems are not giving them any competitive advantage. These systems have gathered a vast amount of “data” over the years. The companies are now realizing the importance of this “hidden treasure” of information. Efforts are now on to tap into this information that will improve the quality of their decision-making. A “data warehouse” is nothing but a repository of data collected from the various operational systems of an organization. This data is then comprehensively analyzed to gain competitive advantage. The analysis is basically used in decision making at the top level. Need For Data Warehousing ? ? ? ? Data Warehouses (DW) provides users with current and historical decision support information that is hard to get in traditional operational data stores. DW can provide strategic business opportunities by allowing customers and vendors access to corporate data while maintaining necessary security measures DW gives competitive advantage from a business perspective as it allows decisions to be taken quickly and correctly by providing all available data in a non technical user friendly way. Also it can handle incremental increase if data in bulk. DW can address the incompatibility of information and operational transaction systems.

Depa r tm ent ofEl ectr i c a la nd El ectr oni c s

By:S ul a bh Ba ns a l

financial. Operational Data Stores (ODS):. since it already contains integrated operational data as of a given point in time. It answers questions as “ How many gadgets were sold to a customer number 123876 on Sepember 19?” Informational data is organized around subject such as customer.not so good at providing answers to high-level strategic questions. It answers decision making question as “What three products resulted in the most frequent calls to the hotline over the past quarter?” Informational data is obtained from operational data sources (including any or all applications. ODSs are integrated and subject-oriented. the ODS contains subject oriented. vendor. The ODS is an ideal data source for a data warehouse. and is non-updateable. Component Systems: ? Legacy Systems:-Any information system currently in use that was built using previous technology generations. In short.Any system from which data is taken for a data warehouse. an ODS is always current and is constantly updated. A source system is often called a legacy system in a mainframe environment.An ODS is a collection of integrated databases designed to support the monitoring of operations. Most legacy Systems are operational in nature. Unlike the databases of OLTP applications (that are function oriented).Le c t ur e N ot e sF o rDBM S a nd Da t a M ining a nd Da t a W ar e hous i ng Operational and Informational data stores ? ? ? Various OLTP( On-line Transaction processing) systems ( e. and point-of –sale systems) create opearational data in corporations Opeartional data is detailed. It is often summarized. updateable and reflects current structure.g. and product. Source Systems:. Like data warehouses. ? Data Warehousing Systems A data warehousing system can perform advanced analysis of operational data without impacting operational systems. work scheduling. ODS is an integrated collection of clean data destined for the data warehouse. ? ? Depa r tm ent ofEl ectr i c a la nd El ectr oni c s By:S ul a bh Ba ns a l . databases. is redundant to support varying data views. OLTP is very fast and efficient at recording the business transactions . and computer systems within the enterprise) after cleaning. order entry. largely because the automation of transaction-oriented business process had long been the priority of IT projects. However. volatile. renaming and providing access methods. and current enterprise-wide detailed information. It serves as a system of record that provides comprehensive views of data in operational sources. non redundant.

For exmple. and outlet sales. This purchased information is linked with internal data about customers to develop a good customer profile. and claim instead of by different products (auto. To address this type of situation. o For example. premium. o A data warehouse is organized around major subjects such as customer. property etc. organized into subject areas like sales. databases. Data are organized according to subject instead of application. products. sales. frequently purchased. an insurance company using a data warehouse would organize their data by customer. catalog. Depa r tm ent ofEl ectr i c a la nd El ectr oni c s By:S ul a bh Ba ns a l . then these “separate” systems are not adequate.). your data warehouse database should be subject-oriented. A Data Warehouse is a ? Subject-oriented ? Integrated ? Time-variant ? Non-volatile collection of data in support of management decisions. a retailer might have separate order entry systems and databases for retail.etc. The second source is made up of external. ? Subject Oriented o OLTP databases usually hold information about small subsets of the organization. Each system will support queries about the information it captures. life. Examples of this data would include lists of income and demographic information. rather than around OLTP data sources.Le c t ur e N ot e sF o rDBM S a nd Da t a M ining a nd Da t a W ar e hous i ng Definition: Data Warehouses are mostly populated with periodic migrations of data from operational systems. But if somebody wants to find out details of all sales.

such as relational databases.Le c t ur e N ot e sF o rDBM S a nd Da t a M ining a nd Da t a W ar e hous i ng ? Integrated o A data warehouse is usually constructed by integrating multiple. but there must be some mechanism to modify the data coming into the data warehouse and assign a common coding scheme. refreshed and accessed for queries. ? Nonvolatile o Unlike operational databases. the source data must be integrated. the encoding of data is often inconsistent. o A data warehouse is always a physically separate store of data. heterogeneous sources. concurrency control etc. warehouses primarily support reporting. o The data are not updated or changed in any way once they enter the data warehouse. Depa r tm ent ofEl ectr i c a la nd El ectr oni c s By:S ul a bh Ba ns a l . but are only loaded. and OLTP files. recovery. o Due to this separation. There is no need to change the coding in these systems. the outlet system code consists of 9 alpha-numerics. o For example. o When data resides in many separate applications in the operational environment. in the above system. data warehouses do not require transaction processing. flat files. and the catalog system uses 4 alphabets and 4 numerics. the retail system uses a numeric 7-digit code for products. not data capture. To create a useful subject area.

It contains the location and description of warehouse system components.acquired directly from the operational database. and enterprise wide decision making.represents the history of the subject data areas ? Data mart. implicitly or explicitly. ? Summarized data. customer activity data. demographic data. an element of time. It is much smaller than current data or old detail data.Le c t ur e N ot e sF o rDBM S a nd Da t a M ining a nd Da t a W ar e hous i ng ? Time Variant o Data are stored in a data warehouse to provide historical perspective. to be used for comparisons. It is organized along the subject lines (customer profile data. ? Drill-down. county. wa r ehous e.) ? Old detail data. and content of the data warehouse and end-users views. For example an analyst can drill down from sales volumes in North America into the state. structure. etc. sales data.It is data about data. Depa r tm ent ofEl ectr i c a la nd El ectr oni c s By:S ul a bh Ba ns a l . and forecasting. names.Data aggregated along the lines required for executive level reporting. trend analysis. ? Metadata. o Every key structure in the data warehouse contains.it may contain lightly summarized departmental data. trends. definition. Operational Systems vs Data Warehousing Systems: O per a ti ona l Hol ds c ur r ent da ta Da ta i s dyna mi c R ea d/W r i te a c c es s es R epeti ti ve pr oces s i ng Tr a ns a c t i on dr i ven A ppl i c a t i o no r i e n t e d Us ed by c l er i c a ls ta f ff or da ytoda y oper a ti ons Da taW a r ehous e Hol ds hi s tor i c da ta Da tai sl a r gel ys ta ti c R ea d onl ya c c es s es A dhocc om pl ex quer i es A na l ys i s dr i ven S ubj ect or i ented Us ed by top m a na ger s f ora na l ys i s Denor ma l i zed da ta N or ma l i zed da ta m odel m odel ( Di m ens i ona l ( ER m odel ) m odel ) M us t be opti mi zed f or quer i esi nvol v i ng al a r ge M us t be opti mi zed f or por ti on of the wr i tes a nd s ma l lquer i es . city. Other terms related to the data warehouse: ? Current detail data. a collection of data marts composes an enterprise-wide data warehouse. o A data warehouse generally stores data that is 5-10 years old.traversing the summarization levels from highly summarized data to the underlying current data or detail.

Le c t ur e N ot e sF o rDBM S a nd Da t a M ining a nd Da t a W ar e hous i ng Advantages of Data Warehousing: ? Potential high Return on Investment ? Competitive Advantage ? Increased Productivity of Corporate Decision Makers Problems with Data Warehousing ? Underestimation of resources for data loading ? Hidden problems with source systems ? Required data not captured ? Increased end-user demands ? High maintenance ? Long duration projects ? Complexity of integration Data Warehouse Architecture A typical data warehousing architecture is illustrated below: Depa r tm ent ofEl ectr i c a la nd El ectr oni c s By:S ul a bh Ba ns a l .