This action might not be possible to undo. Are you sure you want to continue?
Version 6.0 - 04/18/2000
CIS 465 - Data Warehousing
Can your database answer questions like these?
What is the cost of staff to break into a new line of business? What are the travel routes of my competition·s inventory? At what velocity is my competitor moving toward a common goal? How will a transaction on a certain date be affected by currency exchange rates? Is a foreign labor source likely to produce a higher quality product? Which 20% of the problem creates 80% of the problems?
CIS 465 - Data Warehousing 2
Can your database answer questions like these?
By product and location, how can we regain a lost customer base? Which skill and staff levels are most likely to accept the voluntary layoff package?
CIS 465 - Data Warehousing
CIS 465 . collected by many individuals. using different methods and devices.Data is Difficult to Manage Amount of data increases exponentially. Data are scattered throughout organizations. past data must be kept for long times.Data Warehousing 4 . Only small portions of an organization·s data are relevant for specific decisions. new data are added rapidly. An ever-increasing amount of external data needs to be considered in making organizational decisions.
CIS 465 .Data Warehousing 5 . Selecting data management tools can be difficult because of the large number of tools available. Data security.contd. and integrity are critical. and human and computer languages.Data is Difficult to Manage . yet easily jeopardized. formats. quality. Raw data may be stored in different computing systems. Legal requirements relating to data differ among countries and change frequently.
They run on information and their knowledge of how to put that information to use successfully. CIS 465 .Data Warehousing 6 .Data and Knowledge Managementment Businesses do not run on data.
² What is a ´finished productµ to one. (raw material) Information: Data that has been processed into a form that is meaningful to the recipient and is of real of perceived value in current or prospective actions or decisions. or could not predict. CIS 465 .Some Information Concepts Data: Unorganized facts and figures. Information: ² adds to a representation ² corrects or confirms previous information ² has ´surpriseµ value in that it tells us something we did not know.Data Warehousing 7 . may be ´raw materialsµ to someone else.
Definitions: Information vs. and procedures that guide actions and decisions. Knowledge Knowledge: a combination of instincts.Data Warehousing 8 . rules. Helping to provide the best available knowledge to decision-making is another role of information systems CIS 465 . ideas.
Data Warehousing 9 . Example: How do medical students become competent physicians? CIS 465 . Managers take action based on information about a current situation plus their accumulated knowledge. and Knowledge The difference between data and information is easy to remember.Relationship Between Data. There are many methods of converting data into information for decision making. Actions taken feed the process of accumulating more knowledge (experience). It is often cited as the reason why systems that collect large amounts of information fail to meet management·s information needs. Information.
and Knowledge CIS 465 .Relationship Between Data.Data Warehousing 10 . Information.
Attributes of Quality Information Timeliness Completeness Conciseness Relevance Accuracy Precision Appropriateness of Form CIS 465 .Data Warehousing 11 .
Soft data may be as important as hard data. One person·s information may be another person·s noise.accessibility. More information is not always better (information overload). CIS 465 .Data Warehousing 12 .depends on combination of quality.Special Characteristics of Information Usefulness . Politics can often hide or distort information. Ownership of information may be hard to maintain.and presentation.
products. but accessible by networks. or mental models. processes.Data Warehousing 13 . Sales or HR).g. ² Often stored in corporate data bases (e. Some is kept in heads. services. or available on the Web. ² Can be store on PCs. External Data ² Many sources. ² Some data may be disparate in different regions.Sources of Data Internal Data ² Data about people. Personal Data ² Individuals document expertise by creating personal data subjective estimates. CIS 465 .
² ABI/Inform ² Annual Editor & Publisher Market Guide ² Thomas Register On-line Indexes ² Encyclopedia of Business Information Sources Other ² International Monetary Fund ² Moody·s ² Standard & Poor·s ² Advertising Age ² Dialog and Lexis/Nexis CIS 465 .Data Warehousing 14 .Some Sources of Business External Data Federal Publications ² ² ² ² ² ² Survey of Current Business Monthly Labor Review Federal Reserve Bulletin Employment and Earnings Commerce Business Daily Census Bureau Other-contd.
What is a data warehouse? A data warehouse is a pool of data organized in a format that enables users to interpret data and convert it into useful information to gain knowledge from this interpretation. Data warehousing is the act of a business person extracting business value from the data stored in the data warehouse. It is a single place that contains complete and consistent data from multiple sources. CIS 465 .Data Warehousing 15 .
Collecting Raw Data Is not Easy ² collect in the field ² elicit from people ² collect manually. Data Quality is an Important Issue. Data collection technology has not kept pace with advances of data storage technology. or by sensors.Data Warehousing 16 . Data collection from external sources is not easy either. Bottom Line: Garbage IN. Garbage OUT . CIS 465 . electronically.GIGO.
Data Quality Issues
² accuracy, objectivity, believability, reputation.
² Accessibility and access security
² relevancy, value-added, timeliness,completeness, amount of data.
² Interpretability, ease of understanding,concise representation, consistent representation
CIS 465 - Data Warehousing
Why Data Warehousing?
Managers do not make decisions that are ´goodµ or ´badµ, they make decisions on the basis of good or bad information. Management information =
² a. the right information ² b. in the right form ² c. at the right time.µ
Most transaction-based information systems have difficulty delivering this information.
CIS 465 - Data Warehousing
Why Data Warehousing - 2
Not the right information: ² data not easily accessible ² meaning is subtly (or significantly different) from the question context. ² Information is presented with too much or too little detail, covers the wrong time spans, or is in the wrong intervals.
CIS 465 - Data Warehousing
Why Data Warehousing . CIS 465 .3 Not the right time: ² Getting this information may require the efforts of highly skilled professionals who are not generally available at the whim of business managers. ² Data comes from a variety of different systems which are resident on a variety of different technology platforms.Data Warehousing 20 .
4 Not the right format: ² If data is extracted. CIS 465 . ² Users will want it loadable into a particular PC tool or spreadsheet with which he/she is familiar. and converted into a meaningful information.Data Warehousing 21 . ² a diskette with a COBOL file description is not in the right format. merged.Why Data Warehousing . ² Printouts weighing 10 pounds are not in the right format. often it is not in a usable format.
Data Warehousing 22 . CIS 465 . Each ad-hoc report generated by IT and analyzed by the user generates three more reports to further illuminate the insights gleamed in the first. The user is on a ´voyage of discovery in a sea of dataµ. Often the extract programs have few reusable components.The Dilemma for Corporate IT How to control scarce IT resources consumed by insatiable user demand for ad-hoc reports.
These techniques concentrate on business process requirements. not decision support requirements.The Response of Corporate IT New methodologies: Align the IT systems with the business goals and requirements. These systems should not be a ´voyage of discoveryµ for either. Transaction systems must be rigorously specified in advance. The are an intersection between the organization and the customer.Data Warehousing 23 . CIS 465 .
What is the current checking account balance for this customer? Analytical Support Systems: ² Did the sales promotion last quarter do better than the same promotion last year? ² Is the five-day moving average for this security leading or trailing actual prices? ² Which product line sells best in middle-America and how does this correlate to demographic data.Data Warehousing 24 . Analytical Support Systems Transaction Systems: ² ² ² ² Insert an order for 300 baseballs Update this passenger·s airline reservation.Transaction Systems vs. close-out accounts payable records for this vendor. CIS 465 .
Data Warehousing 25 .Analytical Processing Analytical Processing today includes what in the past have been called: ² DSS (Decision Support Systems) ² EIS (Executive Information Systems) ² ESS (Executive Support Systems) It is an evolution of ´End-User Computingµ Placing strategic data access in the hands of decision makers aids productivity and enables them to be better decision makers. CIS 465 .
OLAP OLTP: Processing specific functions OLAP: providing flexibility for undetermined analysis. CIS 465 .Key Difference: OLTP vs.Data Warehousing 26 .
A Multidimensional database .
requires data from many separate internal corporate databases.and not constrained by machine resources. The data must be enriched . CIS 465 .Data Warehousing 28 .through integration with other external data. The data must be available .Data for Decision Support The data must be integrated .
Sources of Data Internal Data: ² ² ² ² ² ² ² Financial Systems Logistics Systems Sales Systems Production Systems Personnel Systems Billing Systems Information Systems External Data Needs: ² to recognize opportunities ² to detect threats ² to identify synergies CIS 465 .Data Warehousing 29 .
Data Warehousing 30 .2 External Data Categories ² ² ² ² ² ² ² ² ² ² Competitor Data Economic Data Industry Data Credit Data Commodity Data Econometric Data Psychometric Data Meteorological Data Demographic Data Sales & Marketing Data CIS 465 .Sources of Data .
CIS 465 . Operational Strategy Data is a source not just of operational control. to continually innovate and re-align strategy with time scales too short to be comprehended by strategic planning in the conventional corporate sense.Operational Control vs. Operational strategy is an attempt to describe the need. in a competitive and turbulent market. but of operational strategy.Data Warehousing 31 .
² ad-hoc queries with some periodic reporting ² updated periodically with mass loads ² data-driven.Data Warehousing 32 .Comparison of Control and Strategy Data: Operational Data: ² ² ² ² ² short-lived. static ² data aggregated into sets (which is why warehouse data is friendly to RDBMS). data governs process. process generates data Strategic Data: ² long-living. rapidly changing requires record-level access repetitive standard transactions and access patterns updated in real-time event-driven. CIS 465 .
Information Requirements by Management Level (Source: Gorry and Scott Morton) Characteristics of Information Source Scope Level of Aggregation Time Horizon Currency Required Accuracy Frequency of Use Operational Control Largely Internal Well defined.Data Warehousing 33 . narrow Detailed Historical Highly current High Very frequent Management Control Strategic Planning External Very wide Aggregate Future Quite Old Low Infrequent CIS 465 .
Dimensional Modeling Dimensional Modeling gives us a way to visualize data. CIS 465 .Data Warehousing 34 . we hear three dimensions: ² We sell Products ² in various Markets ² and measure performance over time.µ From the data warehouse designer·s perspective. and we measure our performance over time. The CEO·s perspective: ² ´We sell products in various markets.
Management may be interested in examining sales figures in a certain city by product.Dimensional Modeling . The more dimensions involved. by salesperson. and by store.contd. The ability to add and modify the dimensions used in a table or graph is often known as ´slicing and dicingµ the data. the more difficult it is to represent in a single table or graph.Data Warehousing 35 . by time period. CIS 465 .
Dimensional Model of the Business T i m e M a r k e t Product CIS 465 .Data Warehousing 36 .
Data Warehousing 37 .Data Dependencies Model of a Business Ship Type Shipper Ship To Product District Credit Order Item Contact Location Product Line Sales Order Customer Location Product Group Contract Contract Type Customer Contact Sales Rep Sales District Sales Region Sales Division CIS 465 .
Transaction Processing The Relation Model was full of promises for equal access to data.Data Warehousing 38 .000 per second. CIS 465 . Today the SABRE system typically processes 4. with peak bursts of 13.000 transactions per second.(On-line Transaction Processing) The point is to get data ´inµ to the database. OLTP . Typical transaction rates were one per second. In the early 1980·s the relational model was a dream.
Segregating Operational and Warehouse Data In the past. Controlled and practical redundancy is better than out-of-control theoretical purity. Separate databases. and perhaps separate DBMS products and processor platforms are used. normalization. data administrators were constantly told to build data sharing. This resulted in LONG response times for complex queries. and non-redundant corporate databases. Early attempts at data warehousing tried to share the data with transaction-based systems.Data Warehousing 39 . CIS 465 . The idea today is to keep the two separate.
Hardware Architecture Inconsistent Data Data Pollution: ² Bad Application Design (semantic and syntactical differences). planning. ² Ownership ² Data Entry Conventions CIS 465 .Data Warehousing 40 .Disintegration grew slowly from islands of automation. ² ownership. economic.Fundamental Obstacles With Traditional Systems Systems Integration . organizational development issues all contribute.
The data warehouse becomes a ´middle groundµ where a large number of disparate and incompatible ´legacy systemsµ are tied to an equally diverse collection of end-user workstations. Legacy systems usually comprise a hodgepodge of assorted hardware. and operational systems accumulated over many decades. tactical. and current events flow from the operational systems to the data warehouse to become static.The Data Warehouse Active.Data Warehousing 41 . and historical data. software. CIS 465 . are by nature. incompatible with one another and unique to each organization. strategic.
combine data from modern platforms and data external to the organization.Practical Facts About the Warehouse The chances are remote that any single vendor will be able to develop a product that can interface with all ´legacy systemsµ painlessly and ´seamlesslyµ and at the same time.Data Warehousing 42 . Instead warehouse product vendors develop specialized capabilities to work with various environments. CIS 465 .
Typical Dimensional Model Sales Fact Time Dimension Time_key product_key store_key dollars_sold units_sold dollars_cost Product Dimension Product_key description brand category Time_key day-of-week month quarter year holiday_flag Store Dimension Store_key store_name address floor_plan_type CIS 465 .Data Warehousing 43 .
Each measurement is taken at the intersection of all the dimensions. CIS 465 . For every query made against the fact table may use hundreds of thousands of individual records to construct an answer set. continuously valued and additive.Data Warehousing 44 .Fact Table Fact Table is where numerical measurements of the business are stored. The ´bestµ facts are numeric.
Example: each member in the product dimension is a specific product. Each textual description helps to describe a member of the dimension. A key role of the dimension table attribute is to serve as the source of constraints in a query.Dimension Tables Dimension tables are where textual descriptions of the business are stored. The product dimension database has many attributes to describe the product.Data Warehousing 45 . CIS 465 .
Data Warehousing 46 .Example Brand Axon Framis Widget Zapper Dollar Sales 780 1044 213 95 Unit Sales 263 509 444 39 CIS 465 .
Example Query Find all product brands that were sold in the first quarter of 1995 and present the total dollar sales as well as the number of units. ² C. Place as Row Header. Drag attribute brand from product dimension. To construct: ² A. Drag Dollar Sales and Units Sold from the Fact Table. Specify row constraint ´1st Q 1995µ on the quarter attribute in the Time Dimension Table. CIS 465 . and place to the right of the Brand row header. Brand is a collection of individual products.Data Warehousing 47 . ² B.
country. actual vs. geographical locations.Data Warehousing 48 . inventory. business units. market segments. weekly. profit. yearly CIS 465 . salespeople. distribution channels. head count. forecast. industry Examples of Facts or Measures: ² money. sales volume. monthly. Examples of Time: ² daily.Multidimensionality Examples of dimensions ² products. quarterly.
copied. Vendors provide tools for extraction and preparation.Data Warehousing 49 . Data is identified. CIS 465 .The first component handles acquisition of data from legacy systems and outside sources. formatted. and prepared for loading into a warehouse.1 Acquisition .Components of a Data Warehouse .
symmetric multiprocessor (SMP) or massively parallel processors (MPP) machines or by software. multi-dimensional databases. executive information and decision support systems can make use of it effectively.Data Warehousing 50 .2 Storage Area . The storage component hold the data so that many different data mining.The second component is the storage area managed by relational databases. CIS 465 . specialized hardware .Components of a Data Warehouse .
Data Warehousing 51 . data discover tools.3 Access .The third component of the warehouse is the access area. CIS 465 . or analysis tools. What good is it to store all the information without some way to understand it in new and different ways.Components of a Data Warehouse . Different end-user PCs and workstations draw data from the warehouse with the help of multidimensional analysis tools. These ´smartµ data-mining tools are the driving force behind the data warehouse concept. neural networks.
Query Facilities and Managed Query environments. CIS 465 . and the concomitant resurrection of the popularity to products like SAS and SPSS.tools work and think for user.One of the biggest surprises in the data warehousing marketplace is the resurgence of interest in traditional statistical analysis.Data Warehousing 52 . Statistical Analysis .Data Warehouse Access Tools Intelligent Agents and Agencies .
insightful. interesting.A large class of tools formerly classified as decision support. fuzzy logic. decision trees. its operations. artificial intelligence and expert systems.Data Warehouse Access Tools .2 Data Discovery . Currently there are nearly 60 different data discovery tools/products on the market. and other tools from advanced mathematics to allow a user to ´siftµ through massive amounts of raw data to ´discoverµ new. and its markets.Data Warehousing 53 . CIS 465 . and in many cases useful things about the organization. They now make use of neural networks.
drill down to see territories within a division. check sales numbers for each store within a territory. CIS 465 . see them broken down by division.3 OLAP . Users are able to ´slice and diceµ reports and to look at the same kinds of information at different levels at the same time.On-line Analytical Processing often uses multi-dimensional spreadsheet tools allowing users to look at information from many different angles. and then compare them against sales of stores from another territory.Data Warehousing 54 .Data Warehouse Access Tools . Typical OLAP application might allow a product manager to view sales figures for a given product at the national level.
or anything else into compelling. CIS 465 .4 Data Visualization .Data Warehousing 55 . boring numbers into exciting visual presentations. dynamic maps. These tools bring graphical representation to new heights. easy to understand. PC-based Geographic Information systems have the ability to display spatial occurrences and the relationship between and among geographically specific variables. individuals. Example: Geographical information systems turn data about stores.Data Warehouse Access Tools .These tools turn ugly.
Typical startup projects allocate 60% of budget for hardware and software for creation of a powerful storage component. 30% on data mining and acquisition tools.Data Warehousing 56 . analysis. and systems development costs. 30% fund user solutions. CIS 465 . 20% creation of databases in the storage component. Budgeting for Systems Analysis and Development has 50% of budget on acquisition capabilities.Developing the Data Warehouse The most expensive warehousing ventures involve major new hardware acquisitions and significant investments in training.
tightly defined tactical systems to solve pressing business needs.Data Warehousing 57 . Scrutinize the offerings of vendors and systems integrators. and develop into larger systems over time.How Will It be Used. Make sure you understand which functions they provide. and which you must build. Most successful projects start as small. CIS 465 .Developing the Data Warehouse Clarify what you want to do with the Warehouse .
DW Summary: Key Concepts The DW is a ´collection of integrated. Inmon. 1992).Data Warehousing 58 . Implicit Assumptions: ² physically separate from operational systems ² hold aggregated data and transactional (atomic) data for management separate from those used for OLTP. CIS 465 . subjectoriented databases designed to support the decision support function where each unit of data is non-volatile and relevant to some moment in time: (W.H.
Data Warehousing . not updated) time variant (kept for long periods.DW Summary: Characteristics Subject-orientation integrated non-volatile (i. for forecasting and trend analysis) summarized large volume not normalized metadata data sources 59 CIS 465 .e.
Data Mining CIS 465 .Data Warehousing 60 .