Introduction to Data Warehouse

(slides in this section are used courtesy of Carrig Emerging Technology Ph: 410- 553- 6760 www.c a r r i g e t. c o m )

1

Introduction to Data Warehousing and Data Introduction to Data Warehousing and Data Mining Mining

1) Data Warehouse Introduction 2) Engineering Conflicts 3) OLTP and DSS 4) Stovepipe vs. Integration 5) Data Warehouse Solution 6) Enterprise Information System 7) Security in a Data Warehouse 8) Moving Data to a Data Warehouse 9) Data Marts 10) Data Mining
2

1

Introduction Introduction
• Key topics for this course include:
– Data Warehouse – Data Mart – Data Mining

• Background and review of relational database systems • Main focus on data warehouse and data mining

3

Data Warehouse Introduction Data Warehouse Introduction
• A data warehouse is a single source for key, corporate information needed to enable business decisions • A database application is a piece of software that provides a user interface for users to add, delete, query and update data • Typically, a database management system is used to actually do the work of adding, deleting, querying or updating data
Application
Database System Data

4

2

Engineering Conflicts, Query and Update Engineering Conflicts, Query and Update
• It is often an engineering problem when data is updated and long-running queries occur at the same time • In some cases, the users who are doing updates must wait for queries to complete • One way to avoid this is to make a read-only copy of data
Database System Application
Data for update Data for query

5

OLTP and DSS Defined OLTP and DSS Defined
• An application that updates is called an on-line transaction processing (OLTP) application • An application that issues queries to the readonly database is called a decision support system (DSS)

OLTP Application Database System OLTP Data DSS Data

DSS Application

6

3

Applications in a Typical Enterprise Applications in a Typical Enterprise • Most organizations have several disparate OLTP/DSS applications in several databases Finance OLTP Application Inventory OLTP Application Sales OLTP Application Finance DSS Application Inventory DSS Application DATABASE SYSTEM Sales DSS Application Finance OLTP Data Finance DSS Data Inventory OLTP Data Inventory DSS Data Sales OLTP Data Sales DSS Data 7 Stovepipe vs Integration Stovepipe vs Integration • When systems stand by themselves they are often referred to as “stovepipes” • Systems that easily share data are called “well integrated systems” Finance OLTP Application Inventory OLTP Application Finance DSS Application Inventory DSS Application 8 4 .

corporate data in the organization • A data warehouse tracks historical data 10 5 .you don’t have to co-ordinate as much when applications are built and you still reap the benefits of data sharing 9 Data Warehouse Solution Data Warehouse Solution • A data warehouse is an attempt to integrate separate DSS so that users can query one place to find the answers to their questions • A data warehouse has the key.Problems with Stovepipe Architecture Problems with Stovepipe Architecture • Problems: – Users who wish to access data must query several different DSS to find it – Data may have fundamental conflicts between DSS – a department code table in one DSS may differ in another DSS – a measurement may be stored in meters in one DSS and yards in another • Solution: – Use a data warehouse. where data is integrated from the several different stovepipe systems – Data warehouse is really sharing-lite -.

Data Warehouse -. corporate sponsor does not exist. corporate questions asked by the corporate sponsor. the project should be sponsored by the CEO – The CEO must be sold on the value to the business to improve competitive advantage by deploying a data warehouse • If an active. you will have a data dump 12 6 . Otherwise. data sources will be very difficult to identify • Only add data to the warehouse that will answer key.A Success Story Data Warehouse A Success Story • Largest data warehouse is Wal-Mart (9 TB) • Uses for Wal-Mart data warehouse – Identifies where a new store should be built based on customer demand – Identifies how stores are performing across the nation – Contains every “scan” from every purchase • Benefits Wal-Mart gained from their data warehouse – Provided competitive advantage over K-Mart – Reduced excess inventory in individual stores – Avoided wasted funds in building stores which would fail 11 Selling the Data Warehouse Selling the Data Warehouse • A data warehouse project will fail without corporate sponsorship – Preferably.

corporate data in the data warehouse Enterprise Information System Data Warehouse 14 7 .Building a Useful Data Warehouse Building a Useful Data Warehouse • You really need: – strong executive sponsorship – good knowledge of the data – sound software engineering – stability from source systems – users who want a success • A 75 percent failure rate is often cited • It is WORTH the effort!!! 13 Enterprise Information System Enterprise Information System • An EIS (Enterprise Information System) allows users to query data in a data warehouse • Users can access key.

Users of an Enterprise Information System Users of an Enterprise Information System • Frequently. more detailed tool – Often very knowledgeable about the data – Willing to do more work to learn about the data – Sometimes even learn SQL to issue their own ad-hoc queries • General users want a tool that provides detailed data. multiple EIS are needed to satisfy different types of users – Some users only want a system that has pre-defined reports so they only need to “click one button” to see data they need. summary data and a simple tool – Must be VERY easy to use. users want to click a few buttons and get data they want – Results must be graphs – Users should be able to drill-down into key areas. but is very easy to use – Want access to the data warehouse to do routine tasks such as “Find me Hank’s phone number”. These users want the system to be no harder to use than a “coffee pot” – Other users want to delve into the data and build their own queries • Executives want a high-level. 15 Users of an Enterprise Information System Users of an Enterprise Information System • Analysts want a flexible. – Simple application. but not so focused on large reports 16 8 . etc.

Data Warehouse // EIS Data Warehouse EIS Finance OLTP Application Inventory OLTP Application Inventory OLTP Data Sales OLTP Application Finance OLTP Data Enterprise Information System S a lle s Sa es OLTP OLTP Data Data Data Warehouse Finance Subject Area Inventory Subject Area Sales Subject Area 17 Need for Data Warehouses Need for Data Warehouses • Data warehouses provide a single place to store key corporate data – The idea is that users can go one place to find this key data using an enterprise information system (EIS) • Data warehouse is also a place to store and access historical data – Users measure performance goals for their company over a period of time – Company statistics are available – Data not stored in the same place is difficult to locate and compare. easily lost – Single query can be used to access key data 18 9 .

but this is not very common 20 10 . real-time data is needed in a data warehouse. corporate information is all in one place • To mitigate that risk.Security in Data Warehouse Security in Data Warehouse • Building a data warehouse does increase security risk because key. These include – – – – – Views Access control Security Administration Encryption Audit 19 Moving Data into the Data Warehouse Moving Data into the Data Warehouse • Moving data from source OLTP systems to the data warehouse is the hard part of data warehousing • Updates to the data warehouse are performed periodically – weekly – nightly – monthly • Occasionally. database system components can be used to protect the data warehouse.

Using Middleware to Move Data Using Middleware to Move Data • Data can be moved to the warehouse via data migration software • This is often called “middleware” because it sits between the source OLTP and the data warehouse Source OLTP System Data Warehouse Migration Software “Middleware” Data Warehouse 21 Need for a Data Mart Need for a Data Mart • A data mart is a subset of the data warehouse that may make it simpler for users to access key corporate data – Sometimes. users only need a piece of data from the data warehouse • The data mart is typically fed from the data warehouse Data Warehouse Inventory Subject Area Finance Subject Area Sales Subject Area New York Data Mart California Data Mart 22 11 .

Data Mart in Action Data Mart in Action Finance OLTP Application Inventory OLTP Application Inventory OLTP Data Sales OLTP Application Finance OLTP Data Enterprise Information System S a lle s Sa es OLTP OLTP Data Data Data Warehouse Finance Subject Area Inventory Subject Area Sales Subject Area California Data Mart New York Data Mart 23 Data Mining Introduction Data Mining Introduction • Data Mining is done by running software that examines a database and looks for patterns in the data • A data warehouse by itself will respond to queries from users – It will not tell users about patterns in data that users may not have thought about – To find patterns in data. data mining is used to try and mine key information from a data warehouse 24 12 .

employee retention employee benefits vs.Advantages of Data Mining Advantages of Data Mining • Data mining allows companies to collect information and make them more productive and beat their competition • Data mining helps identify – why customers buy certain products – – – – ideas for very direct marketing ideas for shelf placement training of employees vs. employee retention 25 Implementing Data Mining Implementing Data Mining • Apply data mining tools to run data mining algorithms against data • There are two approaches: – Copy data from the Data Warehouse and mine it – Mine the data in the Data Warehouse • Popular tools use a variety of different data mining algorithms: – association rules – genetic algorithms – decision trees – neural networks 26 13 .

Data Mining using Separate Data Data Mining using Separate Data • You can move data from the data warehouse to data mining tools – Advantages – Data mining tools may organize data so they can run faster – Disadvantages – Could be very expensive to move large amounts of data Data Warehouse Data Mining Tool Copy of data made by the Data Mining Tool 27 Data Mining Against the Data Warehouse Data Mining Against the Data Warehouse • Data mining tools can access data directly in the Data Warehouse – Advantages – No copy of data is needed for data mining – Disadvantages – Data may not be organized in a way that is efficient for the tool Data Warehouse Data Mining Tool 28 14 .

553.c a r r i g e t. 29 SQL Review (slides in this section are used courtesy of Carrig Emerging Technology Ph: 410. c o m ) 30 15 .Data Mining: Summary Data Mining: Summary • Data mining attempts to find patterns in data that we did not know about • Often data mining is just a new buzzword for statistics • Data mining differs from statistics in that large volumes of data are used • Many different data mining algorithms exist and we will discuss them in the course • Examples – identify users who are most likely to commit credit card fraud – identify what attributes about a person most results in them buying product x.6760 www.

Introduction to SQL Introduction to SQL 1) Introduction to SQL 2) Data Definition Language (DDL) 3) Data Manipulation Language (DML) 4) SELECT Construct 5) SELECT Operators 6) Wildcard Searches 7) Aggregate Operators 8) Calculated Attributes 9) Sorting Results 31 Introduction to Structured Query Language Introduction to Structured Query Language • Structured Query Language (SQL) is the language used to communicate with a relational database – Industry standard – Based on set theory • SQL composed of two types of constructs: – Data Definition Language (DDL) – Defines the structure of the database – Data Manipulation Language (DML) – Provides the constructs to input and retrieve data 32 16 .

SQL Overview -.DML SQL Overview DML • Data Manipulation Language (DML) is used for storing. – UPDATE PRODUCTS SET PRICE = PRICE + 4 – DELETE is used to eliminate rows of data from the database. 'housewares') – UPDATE is used to change rows that already exist in the database. and retrieving data. 'hardware'. etc. – Typical Operations are: – CREATE TABLE defines what columns are in the table and establishes the table – CREATE INDEX defines an index for the table. – INSERT INTO PRODUCTS VALUES ('food'. – DELETE FROM PRODUCTS 34 17 . – Ex: SELECT * FROM PRODUCTS – INSERT is used to add new rows to the database.DDL SQL Overview DDL • Data Definition Language (DDL) is used to describe the structure of the database – Create tables. • Typical operations include: – SELECT is used to retrieve data. Indexes are used to improve database performance 33 SQL Overview -. updating. indexes.

SELECT Overview SELECT Overview • SELECT is used to retrieve records from the database. Price FROM TinyProducts • Query Purpose: Retrieve all information for all employees from the TinyProducts table SELECT * FROM TinyProducts 36 18 . • Single table SELECT constructs: – – – – WHERE IN BETWEEN LIKE – Aggregate Operators – DISTINCT – ORDER BY 35 SELECT Examples SELECT Examples • Query Purpose: Retrieve names and prices of all products SELECT ProductName.

OR. NOT • Query Purpose: List all information about food products that are either cereal or fruit SELECT * FROM TinyProducts WHERE (ProductName = 'Cereal') OR (ProductName = 'Fruit') 38 19 .SELECT with WHERE SELECT with WHERE • The WHERE clause is used to filter which information is returned from a SELECT • Query Purpose: Retrieve all information only for product type of “food” SELECT * FROM TinyProducts WHERE ProductType = ‘Food’ 37 Use of Boolean Operators Use of Boolean Operators • Conditions can be separated by Boolean operators: – AND.

or Fruit' 40 20 . Vegetables. Hardware. or Housewares' – 'Find all food whose type is Meat.00 SELECT ProductType. Fish. ProductName FROM TinyProducts WHERE Price < 2 AND ProductName = 'Fruit' 39 IN Operator IN Operator • The IN operator allows a search for records that match one value in a set of unordered values • Example questions to use IN: – 'Find all products whose type is Food.Boolean Operator Example Boolean Operator Example • Query Purpose: List the names of all products that the type is fruit and the price is less than $2.

ProductType FROM TinyProducts WHERE (ProductName = ’Cookware') OR (ProductName = 'Linens') OR (ProductName = 'Dishes') 41 BETWEEN Operator BETWEEN Operator • The BETWEEN operator allows a search for a range of values • Example Queries: – 'Find all fruit between Bananas and Grapes' – 'Find all cereals whose price is between $1. 'Linens'. ProductType FROM TinyProducts WHERE ProductName in ('Cookware'.00 a box 1.IN Example IN Example • Query Purpose: List the name of Housewares that are Cookware. Linens. or Dishes SELECT ProductName.50 4.00 42 21 . 'Dishes') instead of: SELECT ProductName.50 and $4.

00 AND 8.BETWEEN Example BETWEEN Example • Query Purpose: Find all products whose price is between $2. Hardware FROM TinyProducts WHERE (Price >= 2. Price FROM TinyProducts WHERE Price BETWEEN 2.00) OR (Price <= 8.00 instead of: SELECT ProductName.00) 43 Wildcard Searches of Strings Wildcard Searches of Strings • The LIKE operator is used to search parts of a string • The following wildcard characters are used: % to match any zero or more characters _ to match exactly one character 44 22 .00 and $8.00 SELECT ProductName.

and AVERAGE are used when computing statistics on a range of data • Query Examples: – 'What is the highest batting average on the team?' – 'What is the average number of hits for all the little league teams in the National League?' – 'What are the names of the players that had the lowest average on the little league team?' 46 23 .Wildcard Search Examples Wildcard Search Examples • Query Purpose: List all products whose name starts with an ’C' SELECT * FROM TinyProducts WHERE ProductName LIKE 'C%' • Query Purpose: List all products that have a SKU number with the last 2 characters of ’23' when you don't know the first character SELECT * FROM TinyProducts WHERE SKUNumber LIKE '_23' 45 Aggregate Operators Aggregate Operators • MIN. MAX.

and average batting average of all players in the National League of Little League SELECT MIN(Average). MAX(Average). AVG(Average) FROM PLAYERS WHERE League = 'National' 47 SUM and COUNT Operators SUM and COUNT Operators • Use the SUM operator to total the results of a query • COUNT will count the total number of occurrences of an item in a search 1+2+3+4 48 24 .Aggregate Operators Example Aggregate Operators Example • Query Purpose: Find the minimum. maximum.

SUM And COUNT Examples SUM And COUNT Examples • Query Purpose: Find the total number of homeruns hit by all players in the American League? SELECT SUM(HomeRuns) FROM PLAYERS WHERE League='American' • Query Purpose: List the names of players that have hit 3 home runs in the National League? SELECT COUNT(*) FROM PLAYERS WHERE HomeRuns = '3' AND League = 'National' 49 Calculated Attributes Calculated Attributes • A new attribute can be obtained by using arithmetic operators (+. /) 50 25 . *. /) on other numeric attributes • All operators follow standard precedence: – Multiplication and division are computed first left to right – Addition and subtraction are computed last left to right – Use parenthesis to override the standard precedence (+. *.-.-.

and their batting average SELECT Name. (Hits / AtBats) FROM PLAYERS 51 DISTINCT Operator DISTINCT Operator • DISTINCT is used to exclude duplicate occurrences in the result of a query • Query Purpose: List all distinct batting averages SELECT DISTINCT(Average) FROM PLAYERS 52 26 . at bats. AtBats.Calculated Attributes Example Calculated Attributes Example Query Purpose: List all players with their hits. Hits.

Average FROM PLAYERS ORDER BY Name DESC 54 27 . Otherwise.Sorting Query Results Sorting Query Results • The ORDER BY clause is used at the end of the SELECT statement to sort the results of a query • Use DESC on the end of the ORDER BY clause to sort the data in descending order. Average FROM PLAYERS ORDER BY Average • For descending order add the keyword DESC SELECT Name. the result will be in ascending order 53 Sorting Example Sorting Example • Query Purpose: List all players in ascending order of their batting average SELECT Name.

Hits / AtBats FROM PLAYERS ORDER BY 3 DESC 55 More SQL More SQL 1) GROUP BY Construct 2) HAVING Filter 3) Multiple Tables 4) Joins 5) Equijoins 6) Cartesian Product 7) Nulls 8) OUTER JOIN 56 28 . use its position in the list of columns following SELECT • Query Purpose: List all players in descending order of their batting average (here we assume batting average is computed at the time of the query) SELECT Name.Sorting Calculated Attributes Sorting Calculated Attributes • To refer to a computed attribute in the ORDER BY. AtBats. Hits.

AVG(Salary) FROM EMPLOYEE GROUP BY Department 58 29 . list the average salary using the EMPLOYEE table SELECT Department.GROUP BY Clause GROUP BY Clause • GROUP BY will partition a table into multiple groups of related rows. • As an example. consider the EMPLOYEE table where Department partitions the EMPLOYEE set into subsets: Engineering Marketing Finance Customer 57 GROUP BY Example GROUP BY Example • Query Purpose: For each department.

GROUP BY With WHERE GROUP BYWith WHERE GROUP BY GROUP BY With WHERE WHERE • To filter data further. MAX and AVG. MAX(Salary) FROM EMPLOYEE WHERE Title='administrative assistant' GROUP BY Department 59 HAVING Construct HAVING Construct • HAVING is used to restrict the output of aggregate functions. SELECT Department. Query Purpose: List the average salary for all departments that have more than three employees. such as SUM. we can use the WHERE clause with GROUP BY clause Query Purpose: For each department. to only those groups of rows that meet some condition. AVG(Salary) FROM EMPLOYEE GROUP BY Department HAVING COUNT(*) > 3 60 30 . list the highest salary of their administrative assistants. MIN. SELECT Department.

Multi-Table SQL Multi-Table SQL • It is often necessary to combine data into multiple tables. EMPLOYEE EmpID Name Salary 1 2 3 4 Fred 200 ATTENDS EmpID Name 1 2 2 3 3 3 Harvard GMU Yale MIT Stanford GMU 61 Ethel 300 Mike 400 David 100 Joins Joins • Joins are the means by which multiple tables can be combined. Outer Join. Inner Join 62 31 . • A join allows us to combine data from different tables. A join operation is done through the SELECT construct. • Types of Joins: Equijoin.

Name FROM EMPLOYEE a.Equijoin Equijoin • Joins only those rows where a foreign key matches the primary key • Allows information from multiple tables to be linked together in a single query • Can be used to link as many tables as needed in a single query 63 Equijoin Query Example Equijoin Query Example • Query Purpose: List the names of all colleges attended by Ethel SELECT b. ATTENDS b WHERE a.EmpID = b.EmpID AND a.Name = 'Ethel' 64 32 .

you get a lot of excess garbage that you probably do not want.Name FROM EMPLOYEE a.65 4.85 2. Sample Query: SELECT b. a cartesian product is produced – Restated in English: When the linking condition is omitted from the WHERE clause.0 65 Warning about Joining Tables Warning about Joining Tables • A join is really just a subset of a cartesian product.79 3. ATTENDS b WHERE a. When no fields are 'joined' in the WHERE clause.Name = 'Ethel' 66 33 .Equijoin Example Equijoin Example EMPLOYEE EmpID 1 2 3 Name Fred Ethel Mike Salary 200 300 400 ATTENDS EmpID 1 2 2 3 3 3 College Harvard GMU Nova Yale Nova GMU GPA 2.45 3.65 2.

• To prevent a column from having nulls..5 67 Nulls Nulls • An attribute may be defined as null.Cartesian Product Cartesian Product • Each row in one table with every other row in other table a. 68 34 .8 3. 300 300 300 300 b..EmpID 1 2 3 4 b.Name a.EmpID a. specify NOT NULL on the column in the CREATE TABLE statement when setting up the database.GPA 3. • This indicates that the value is unknown and avoids the need for user-defined special indicators.Salary 2 2 2 2 Ethel Ethel Ethel Ethel ..4 2.7 3.

70 35 .Nulls Examples Nulls Examples Statement Purpose: Add an employee whose salary is unknown INSERT INTO EMPLOYEE (3. • Nulls are returned when a row in the 'left' table has no corresponding rows in the right table. • A LEFT OUTER JOIN returns all rows from the 'left' table. NULL) Query Purpose: Find all employees whose salary is unknown (or null) SELECT * FROM EMPLOYEE WHERE Salary IS NULL 69 OUTER JOIN OUTER JOIN • An OUTER JOIN is used when the query should return a result row even for rows that do not have corresponding data in one of the tables.'Hank'.

LEFT OUTER JOIN Example LEFT OUTER JOIN Example • Query Purpose: List the college GPAs for each employee.45 Ethel 3.EmpID = b.85 Mike 2.65 Mike 4. Include employees who have not attended any colleges SELECT a.Name.00 David NULL 72 36 . only those who attended a college would be listed – Here.79 Ethel 3. – For an equijoin. b.GPA FROM EMPLOYEE a LEFT OUTER JOIN ATTENDS b on a. but is still retrieved by the outer join.65 Mike 2.EmpID 71 LEFT OUTER JOIN Example LEFT OUTER JOIN Example • Result of the outer join – All employees are listed. employee number 4 did not attend college.----Fred 2. Name GPA ---------.

Advanced SQL (slides in this section are used courtesy of Carrig Emerging Technology Ph: 410.6760 www.c a r r i g e t. c o m ) 73 Advanced SQL Advanced SQL 1) Finding the nth element in a list 2) Finding the median 3) Correlated subquery 4) Data Definition Language Constructs 74 37 .553.

with just one column. 75 Find the Nth Element: Example Table Find the Nth Element: Example Table • Consider a table. with the following values: X 4 5 8 76 38 . – Examples: – Who makes the second highest salary in marketing department? – What is the fifth best product in sales? – This can be done with a program that uses SQL to access the database: SQL is sent to the database and the program keeps retrieving the result set until the threshold is crossed. x. called TEST. • We show another way of doing this using standard SQL.Find the Nth Element Find the Nth Element • It is very common to try to find the nth element in a list.

5 has two matches. For example. This number matches the position of this value in the list. each number on the list now has a certain number of values that match on the right. this yields each element matched with every other element: 4 4 4 5 5 5 8 8 8 4 5 8 4 5 8 4 5 8 77 Find the Nth Element: Step 2 Find the Nth Element: Step 2 • Next keep only those rows where the first column is greater than or equal the second column. 78 39 . 8 has three matches. 4 4 4 5 5 5 8 8 8 4 5 8 4 5 8 4 5 8 4 5 5 8 8 8 4 4 5 4 5 8 Notice the pattern that just developed.Find the Nth Element: Step 1 Find the Nth Element: Step 1 • First join TEST with itself. 4 has only one match as it is the first number in the list.

SKUNumber HAVING COUNT(*) = (SELECT COUNT(*)-1 FROM TinyProducts) 80 40 .Price. a.SKUNumber FROM TinyProducts a.a. • The same ideas can be applied to any SELECT statement output.Price >= b.ProductType.ProductName.ProductType. a. a. 4 5 5 8 8 8 4 4 5 4 5 8 4 5 8 1 2 3 79 Finding the Nth Element: Example Finding the Nth Element: Example • Query Purpose: Find the information about the product with the second highest price. SELECT a.Price GROUP BY a. a. a.Find the Nth Element: Step 3 Find the Nth Element: Step 3 • Now group by the column on the left and identify the size of each group. TinyProducts b WHERE a.Price.ProductName.

ProductType.Price. a.ProductName.ProductType.Finding the Top N Elements: Example Finding the Top N Elements: Example • To ask for the top n values instead of the nth value. a.a.Price. a.SKUNumber HAVING COUNT(*) >= (SELECT COUNT(*)-1 FROM TinyProducts) ORDER BY a. TinyProducts b a.SKUNumber COUNT(*) = (SELECT (COUNT(*)/2)+1 FROM TinyProducts) 82 41 .ProductType. a. a. a.a. a.Price BY a. specify a range (>=) instead of just an equality (=) in the HAVING.SKUNumber TinyProducts a.Price 81 Finding the Median Finding the Median • The median is defined as the element in the middle of the list. TinyProducts b WHERE a.Price >= b. • Query Purpose: Find the median price in TinyProducts.ProductName.ProductType.Price. a. a. a. SELECT FROM WHERE GROUP HAVING a.Price >= b.SKUNumber FROM TinyProducts a.ProductName.ProductName.Price.Price GROUP BY a. SELECT a. • Query Purpose: Find information about the products with the two highest prices.

Name.Salary FROM Employee a WHERE EXISTS (SELECT FROM WHERE AND b.SKUNumber FROM TinyProducts a WHERE Price = (SELECT MAX(PRICE) FROM TinyProducts) 83 Correlated Subquery Correlated Subquery • If the subquery references a data element from outside of the subquery. using a simple subquery. it is called a correlated subquery. a.Using Subqueries Using Subqueries • A subquery may be used in the middle of a query. The following query will indicate who makes more money than ‘Ethel’ SELECT a. a.Price.ProductName. – For each row in the outer part of the query.ProductType.Salary b.Salary > b. • Query Purpose: Find the information about the highest priced product.Name = 'Ethel') 84 42 .Salary Employee b a. SELECT a. a. a. the correlated subquery is executed.

Other Data Manipulation Other Data Manipulation • INSERT – Add rows to a single table • UPDATE – Modify rows in a single table • DELETE – Remove rows from a single table 85 INSERT Examples INSERT Examples • Statement Purpose: Add a record for employee #1. ’Fred'. 200) • Statement Purpose: Copy all rows in the EMPLOYEE table and place them in NEW_EMPLOYEE INSERT INTO New_Employee SELECT * FROM Employee 86 43 . ’Fred' with a salary of 200 to the EMPLOYEE table INSERT INTO Employee VALUES (1.

10 87 DELETE Examples DELETE Examples • Statement Purpose: Remove all employees who have a salary higher than 100. DELETE FROM Employee WHERE Salary > 100 • To remove all employees: DELETE FROM Employee 88 44 .00 WHERE Name = 'Fred' • Statement Purpose: Give all employees a ten percent raise UPDATE Employee SET Salary = Salary * 1.UPDATE Example UPDATE Example • Statement Purpose: Modify Fred’s salary to 150 UPDATE Employee SET Salary = 150.

CREATE TABLE Example CREATE TABLE Example • Statement Purpose: Create a table to store employee information CREATE TABLE EMPLOYEE (EmpId SMALLINT.553. Salary DECIMAL(5.6760 www. Name CHAR(10). c o m ) 90 45 .c a r r i g e t.2)) To drop the EMPLOYEE table DROP TABLE EMPLOYEE 89 Data Warehouse Security (slides in this section are used courtesy of Carrig Emerging Technology Ph: 410.

Data Warehouse Security Data Warehouse Security 1) Key Security Services 2) Views 3) Access Control 4) Roles 5) Encryption 6) Audit Trails 7) Security Holes 8) Intrusion Detection 9) Misuse Detection 91 Introduction Introduction • A key feature provided by database systems is good security services. – In a database system with good security. Database System EIS Security Services 92 46 . • A data warehouse also requires good security services because it holds key. applications do not have to worry about problems that arise with security violations. corporate data.

Key Security Services Key Security Services
• Access Control
– Controls who accesses what data

• Administration of Access Control
– Used to give access to users as well as track who has various accesses and what kind of accesses are given to a user or group of users – Audit tracks the usage of the data warehouse

93

Security in a Data Warehouse Security in a Data Warehouse
• A data warehouse consolidates organizations key data in one place.
– A data warehouse increases the security risk that unauthorized users will try to obtain this data

• Security aspects of EIS applications must be designed and implemented very thoroughly. • Access control and audits are two of the critical components of security.

94

47

Data Warehouse Security Components Data Warehouse Security Components
• Database system components that can be used to protect a data warehouse include:
– Views – Allow users to only see certain rows or columns of data – Access control – Indicate which users have access to what data – Administration – This component is used to actually give access to groups of users and to define the accesses given to either an individual or a group. – Encryption – Protect data from access outside of the DBMS – Audit – Track what users are doing

95

Views in Data Warehouse Views in Data Warehouse
• A view is a logical view into one or more tables. Users may be given access to the view without access to the base table. • Views provide some security assistance because they can hide data from users.
EMPLOYEE
Name Hank Esther Tom Sue Dave Pete Kathy Address 1 South Street 2 North Street 34 Main Street 45 Easy Street 56 5th Avenue 7 Broadway 89 Western Avenue Salary $50,000 $80,000 $90,000 $28,500 $35,000 $60,000 $85,000

96

48

View Example View Example
• A view called SAFE_EMPLOYEE may be created as:
CREATE VIEW SAFE_EMPLOYEE AS (SELECT name, address FROM EMPLOYEE)

Now users of the view SAFE_EMPLOYEE will not even know that salary exists.
SAFE_EMPLOYEE
Name Hank Esther Tom Sue Dave Pete Kathy Address 1 South Street 2 North Street 34 Main Street 45 Easy Street 56 5th Avenue 7 Broadway 89 Western Avenue Salary

VIEW (SAFE_EMPLOYEE) “Salary” is effectively hidden

97

Updating Views Updating Views
• Restrictions exist on updating views. For the EMPLOYEE table, it is possible to insert into the SAFE_EMPLOYEE view.
– Example : INSERT INTO SAFE_EMPLOYEE VALUES (‘Hank’, 300) This will insert a NULL into the SALARY column of the base table EMPLOYEE.

• Other restrictions to view updates exist:
– Cannot update a view that is defined with an aggregate – Cannot update a view that is defined with a GROUP BY

98

49

• Syntax – GRANT <ALL|UPDATE|DELETE|INSERT|SELECT> ON <object-name> TO <user name> – Example: GRANT SELECT ON EMPLOYEE TO MARY • Access control is done by DBAs and creators of tables. • To remove access the REVOKE command is used. Mike) GRANT SELECT ON LOAN TO LOAN_OFFICER 100 50 . John. Accesses may then be given to a group of users. – As an example. – Examples: CREATE ROLE loan_officer AS (Hank. – This dramatically simplifies administration. – Example: REVOKE SELECT ON EMPLOYEE FROM MARY 99 Database Roles Database Roles • Roles provide security administration by allowing users to be grouped into roles.Data Warehouse Access Control Data Warehouse Access Control • Access control is implemented in a data warehouse with the SQL Grant and Revoke commands. some roles for a company might be: – Administrative assistant – Loan officer – Salesperson • Accesses may be assigned based on roles. it is not necessary to add thousands of new accesses. – If new tables are created.

accesses are controlled consistently (same for SALES as MARKETING) • However. 101 Application Roles Application Roles • The application can restrict: – Data entry screens – Reports • Care must be taken to restrict users in a consistent fashion so that a user cannot jump to a different application and avoid security set up by another application.Example of Application-based Roles Example of Application-based Roles • Consider: Users Applications Database System Data • If the database system controls accesses than it does not matter what the application does. 102 51 . more fine-grained access control can be granted in the application.

– The size of the key is a factor in how difficult it is to attack the encryption scheme. • Three places where encryption might be used in a data warehouse: – Network – Data – Tape backups 104 52 . 103 Encryption Encryption • Encryption is the process of coding data so that it can only be read by users who have the key that allows them to decrypt the data. • Database level security is needed so that users are only allowed to see data they need to see. • Application level security can be used to control access to certain menus so that users do not even know what reports exist. it can then be decrypted. Once the key is paired with the encrypted string “xyzzy”. – Example: A message “sell 500 shares” would appear as “xyzzy” without the key.Role Based Security in a Data Warehouse Role Based Security in a Data Warehouse • Both application and database level security are useful in a data warehouse.

106 53 . • Encrypting network traffic mitigates the risk that an attacker could succeed with the “man in the middle” attack. • Without this. User Network Data Warehouse Application Database System Tape Backup 105 Network Encryption Network Encryption • Network encryption is critical because the network connects all of the key components in a data warehouse. data and queries are transmitted through a network. • One way to reduce the risk of this threat is to encrypt traffic on the network. it may be possible for the “man in the middle” to masquerade as another user and circumvent existing application and database security.Network Encryption Network Encryption • In a data warehouse application. – Attackers might be able to steal network traffic just by breaking into the network medium.

• If the database is encrypted. databases are copied to some kind of long-term storage (usually tapes). EIS Database System Data Warehouse Tape Backup 108 54 . EIS Database System Data Warehouse 107 Backup Encryption Backup Encryption • Periodically. but the tapes are not encrypted. they would have to decrypt it in order to read it. the risk exists of someone walking off with the tapes.Data Encryption Data Encryption • Data encryption refers to encrypting the actual data in the data warehouse. • If the attackers were to retrieve data from the warehouse.

SELECT) – For UPDATE. • If a user is suspected of an evil deed. updates. Object that has been accessed (table or view). deletes. UPDATE. 109 Details of DW Audit Trails Details of DW Audit Trails • An audit trail of a database system typically includes the following information: – User ID. the old value and new value is tracked. Date. the SELECT is often used to track the queries that have been run against the warehouse. 110 55 . and additions of new data to the data warehouse. – Audit trails are turned on when the DBMS is started and all activity that uses the data warehouse is tracked in the audit trail.Audit Trails Audit Trails • Audit trails are a means of tracking queries. the audit trail can be examined to identify what data has been accessed by users. Time. DELETE. Action that accessed the object (INSERT. • For data warehouses.

Other Uses for DW Audit Trails Other Uses for DW Audit Trails • Audit trails can be used to identify the most popular data in the warehouse. – Administrators know where to focus their efforts – Reduces administrative overhead 111 Dealing with Known Security Holes Dealing with Known Security Holes • Commercial database systems and operating systems are often filled with holes that allow users to obtain unauthorized access. • One of the key risks surrounding a data warehouse is that privileged users have the “keys to the kingdom”. vendors often provide “fixes” to their products as soon as these holes become public. – To reduce the risk of these known holes. 112 56 . • It is important to constantly keep up with known security holes and apply the latest fixes as soon as they are released. – This information can be used to optimize queries • An additional use for audit trails is performance tuning of the data warehouse.

Security Services Access Control Audit Security Services Access Control Audit Database Services Database Tuning Query Optimization Backups Database Services Database Tuning Query Optimization Backups 114 57 .The Risk of “Privileged Users” The Risk of “Privileged Users” • "Privileged users" include: – Data warehouse administrators – Operating system programmers – Operators in the computer center – These users can: – Modify. – This would separate the task of giving accesses and managing the audit trail from the task of making sure the data in the warehouse was correct and properly optimized. 113 Reducing the Risk of Privileged Users Reducing the Risk of Privileged Users • One way to reduce the risk of privileged users is to separate security administration from database administration. delete and query any data in the warehouse – Modify the audit trail to mask their actions – Give other users unauthorized access • Numbers of "privileged users" could be anywhere from 20 to 30 in some organizations.

but identification of misuse is typically MUCH harder to do than intrusion. – Misuse – Misuse. The assumption is the user is external to the environment (e. 115 Intrusion Detection Intrusion Detection • An intrusion is defined as an unauthorized access to a system. often referred to as the insider problem occurs when a user who has access to the warehouse uses that access for an unauthorized purpose • Audit Trails can be used to identify either type of attack. a hacker). – These tools monitor access to the data warehouse and sound an alarm if unauthorized accesses are detected.Information Security Attacks Information Security Attacks • Two types of Information security attacks on data warehouses are: – Intrusion – An intrusion occurs when an unauthorized user gains access to the data warehouse. intrusion detection tools are used. INTRUSION DETECTION SYSTEM USER DATA WAREHOUSE 116 58 .. • To reduce the risk of intrusion.g.

– This is also known as the insider problem. • Audit Trails are useful for: – Catching attackers – Identifying usage trends of the data warehouse 118 59 . 117 Summary Summary • DBMS Security is useful for data warehouses to hide data from users with views and to restrict access to data with GRANT and REVOKE. • Application Level Security assists EIS that access data warehouses by hiding certain reports from users. – Some estimates have shown that 80 % of computer crime is a result of misuse. • Encryption can be used to further protect against the risk of someone walking off with the data warehouse. • For data warehouses the threat of misuse is high especially by privileged users.Misuse Detection Misuse Detection • Unwanted access by a user that has the ability to access data is referred to as misuse.

Moving Data to the Data Warehouse (slides in this section are used courtesy of Carrig Emerging Technology Ph: 410. c o m ) 119 Moving Data to the Data Warehouse Moving Data to the Data Warehouse 1) Moving Data into the Data Warehouse 2) Updating the Data Warehouse 3) Full Refresh 4) Copy Only the Changes 5) BCP 6) Simple Transformations 7) Complex Transformations 8) Commercial ETL Tools 120 60 .553.c a r r i g e t.6760 www.

SQL Server’s BCP) – Commercial tools 121 Updating the Data Warehouse Updating the Data Warehouse • OLTP (On-Line Transaction Processing) Systems have to send their updates to the data warehouse.Moving Data into the Data Warehouse Moving Data into the Data Warehouse • Data must be moved to the data warehouse from source systems. • Some key issues: – Determine the frequency of data updates -. – Various means of updating data in the warehouse exist: – SQL Commands – Database system load programs (e. Finance OLTP Application Inventory OLTP Application Sales OLTP Application Data Warehouse Finance Subject Area Inventory Subject Area Sales Subject Area 122 61 ..how often should data be moved from source systems to the data warehouse.g.

– Monthly or weekly update – Much more manageable 124 62 . – Real time update – Expensive – Requires update of warehouse while users are querying – Daily update – Somewhat cheaper than real time. or in real-time.Frequency of Updates to the Data Frequency of Updates to the Data Warehouse Warehouse • Updates may occur daily. weekly. monthly. to quarterly. but significant maintenance required if the warehouse has lots of tables. Finance OLTP Application te da Up ily Da Finance Subject Area Inventory OLTP Application ate Upd kly Wee Sales OLTP Application te pda ly U nth Mo Data Warehouse Inventory Subject Area Sales Subject Area 123 Determining the Frequency of Updates Determining the Frequency of Updates • Requirements should drive update frequency • Range of updates runs from real-time.

Source OLTP esh efr ll R Fu Finance Subject Area Data Warehouse Inventory Subject Area Sales Subject Area 125 Full Refresh Full Refresh Target Data Warehouse Source Table Target Table 126 63 . Only the Changes Inventory OLTP Application ges an Ch Finance OLTP Application Sales OLTP Application es tabl o m e les of s b e s h ther ta refr o F u l l ges for n cha ate pd tu las ce sin • Copy the entire source table in the OLTP system to the destination table in the Data Warehouse.Updating the Warehouse Updating the Warehouse • Full Refresh vs.

Copy Only the Changes Copy Only the Changes • Copy only the changes to the source table in the OLTP system to the destination table in the data warehouse. Historical data no longer in source OLTP.may “run out of night” – Can lose out on warehouse ability to track historical data. Only the Changes Full Refresh vs. Only the Changes • Full Refresh – Pros – Much easier to implement – Less chance of messing up your database (good data integrity) – Cons – Can take a lot longer to actually do -. 127 Full Refresh vs. • Only the Changes (DELTA) – Pros – Tracks historical data – Cons – Can be very hard to implement – Can require changes in source applications (more on this later) 128 64 . Source OLTP Source Table Target Table Target Data Warehouse Modified data since last update to the warehouse Data from two updates ago.

Full Refresh Using INSERT-SELECT Full Refresh Using INSERT-SELECT • One way to move data from one table to another is via the INSERT-SELECT. – Syntax: INSERT INTO <target_table> <any sql SELECT statement> • Example: INSERT INTO DW_EMPLOYEE SELECT * FROM EMPLOYEE TARGET 129 Updating Changes Using INSERT-SELECT Updating Changes Using INSERT-SELECT • Changes may be moved by adding a WHERE clause to the INSERT-SELECT. • Example: – INSERT INTO DW_EMPLOYEE SELECT * FROM EMPLOYEE WHERE DATE-UPDATED = DATEPART(m. CURRENT_TIMESTAMP) 130 65 .

txt file into the pub2 table in the pubs database. execute from the command prompt: bcp pubs.pub2 in publishers.Updating Using BCP Updating Using BCP • BCP is the bulk copy program that comes with MS SQL Server.txt data file in ASCII text format..publishers out publishers. • Syntax: bcp <table> [in | out] <data file> Source OLTP Unload Temporary Flat File Target Data Warehouse Load Source Table Target Table 131 BCP Example BCP Example • To bulk copy data from the publishers table in the pubs database to the publishers. execute from the command prompt: bcp pubs. – Bulk copy (BCP) moves data to or from a flat file to a SQL table..txt -c -Sservername -Usa -Ppassword • To bulk copy data from the publishers.txt -c -Sservername -Usa -Ppassword 132 66 .

it is often necessary to transform data. Before the data is moved from system A. Total Cloth = 20 meters) Data Warehouse P a t t e r n = 3 1 . we need to transform the data. T o t a l C l o t h = 5 0 yards P a t t e r n = 3 2 . T o t a l C l o t h = 7 0 yards 133 Complex Transformation Complex Transformation • More complex transformations occur when a value in a source table must be moved to several locations in a data warehouse.36 cm COLOR TABLE 3 Long Sleeves TABLE 4 Long Sleeves Data Warehouse 134 67 . LS) 34 in CONVERT TO CENTIMETERS BLUE TO RT VE 4 ON DE 8 and C O C eves) bles sle o ta g (lon in tw put 84 TABLE 2 TABLE 1 86.Simple Transformation Simple Transformation • In addition to moving data from OLTP to the warehouse. Store 31 (Pattern = 31. – Example: System A stores TOTAL_CLOTH in meters and system B stores TOTAL_CLOTH in yards. 34 Inches. BLUE3 4 8 4 (Color = Blue. TOTAL_CLOTH = 50 Store 32 yards ) TRANSFORMATION (Pattern = 32.

• All provide the ability to code complex transformations. 135 Data Transformation Services Data Transformation Services 136 68 .Commercial ETL Tools Commercial ETL Tools • Key tools in the marketplace – – – – Informatica Ardent DecisionBase (Platinum) Microsoft Data Transformation Services • All provide libraries of common transformations.

Choose a Source Choose a Source 137 Choose a Destination Choose a Destination 138 69 .

Choose to use a Query for Transfer Choose to use a Query for Transfer 139 Enter SQL Query Enter SQL Query 140 70 .

Choose Destination TableName Choose Destination TableName 141 Verify Transformation Verify Transformation 142 71 .

Decide When to Run Transformation Decide When to Run Transformation 143 Final Verification Final Verification 144 72 .

0 1996-07-04 00:00:00.000 1996-07-04 00:00:00.0000 9.0 0.8000 34.6000 42.000 1996-07-04 00:00:00.4000 discount 0.0 0.000 146 73 .0 0.000 1996-07-05 00:00:00.000 1996-07-05 00:00:00.8000 18.0 0.Run Transformation Run Transformation 145 Check Results Check Results select * from orderfact orderid 10248 10248 10248 10249 10249 orderdate productid productname 11 42 72 14 51 Queso Cabrales Singaporean Hokkien Fried Mozzarella di Giovanni Tofu Manjimup Dried Apples quantity unitprice 12 10 5 9 40 14.

6760 www. c o m ) 148 74 . but historical data is lost and it may take a lot of time. • ETL commercial tools are beginning to mature and can lessen the pain of this task.Summary Summary • ETL is one of the hard parts of building a data warehouse. 147 More Ways of Moving Data to the Data Warehouse (slides in this section are used courtesy of Carrig Emerging Technology Ph: 410. • Doing full refresh is easy. • Either full refreshes of data or just the changes may be done. • Tracking changes is a tough business.c a r r i g e t.553.

More Ways of Moving Data More Ways of Moving Data to the Data Warehouse to the Data Warehouse 1) Determining What Data Has Changed 2) Recovery Logs 3) Triggers 4) Insert Triggers 5) Delete Triggers 6) Update Triggers 7) Manual Detection 149 More Ways of Moving Data More Ways of Moving Data to the Data Warehouse to the Data Warehouse • There is a need to move data into the data warehouse from OLTP and DSS applications • The problem is detecting what data needs to be moved into the data warehouse • Three methods: – Recovery Logs – Triggers – Manual Techniques 150 75 .

’50000) Mktg 35000 IT 71000 HR 0 Sales 60000 35000 71000 0 60000 55000 110000 152 76 .) Determining What Data Has Changed (cont. Fred Mktg Hank Sales Sue Joe UPDATES IT Sales SALARY 35000 60000 71000 50000 ? ? A LE TAB “ROW X” B LE TAB “ROW X” EmployeeCount DEPT Mktg Sales IT HR COUNT 1 1 2 1 0 SalaryInfo DEPT AVG SAL TOT SAL P OLT Insert into Employee Values (‘Joe’.Determining What Data Has Changed Determining What Data Has Changed • Problem: How to get updates made to the source to the same information in the data warehouse? SOURCE How to get updates from Source Table A to Data Warehouse Table B DATA WAREHOUSE A LE TAB S TE DA UP P OLT ? B LE TAB 151 Determining What Data Has Changed (cont.’Sales’.) • Problem: How to get updates made to multiple sources to the same information in the data warehouse? SOURCE DATA WAREHOUSE A LE TAB “ROW X” Employee UPD A ROWTES X NAME DEPT.

What is the Recovery Log? What is the Recovery Log? • Recovery log is used for transaction processing – Used to handle errors – Does contain before and after image. – Change Data Capture Utility – This scans the database log and identifies all changes that the user is interested in and either writes them to a file or stores them in another table. • Recovery log can be used to identify the data to be updated in the data warehouse. 153 Change Data Capture Utility in Action Change Data Capture Utility in Action SOURCE OLTP DATA DBMS LOG All changes to DBMS RECOVERY LOG S AD RE CHANGE DATA CAPTURE UTILITY DATA WAREHOUSE WRITES 154 77 .

NewSalary=200 SET Salary=Salary*2. • Commercial tools such as CA’s log analyzer can place the results of their work in a table.Example of Using Recovery Log Example of Using Recovery Log • Consider an update to the Employee table – The information is recorded in the log – The change data capture reconstructs update – Can then be sent to the data warehouse UPDATE EMPLOYEE Where SSN=10 LOG TABLE=EMPLOYEE SSN=10 OldSalary=100. Use commercial tools to read the log and identify the changes. 156 78 .0 CHANGE DATA CAPTURE RECONSTRUCTS DATA WAREHOUSE UPDATE 155 Using the Recovery Log Using the Recovery Log • Recovery logs are usually in proprietary format.

might as well use it to find what has changed • Con – Some difficult scenarios may occur where it is hard to see what the new update should be in the Data Warehouse.Summary of Change Data Capture Summary of Change Data Capture • Pro – Log exists anyway. another event is triggered. – Many tables will be in the source that have nothing to do with the data warehouse. may not be supported in many DBMS and will always lag behind DBMS development. UPDATE. 158 79 . – A trigger can be added to a source table and whenever the source table is updated. but change data capture will process their changes as well. • Triggers can be used to detect the changes and perform data warehouse updates. an update can be placed either directly in the warehouse or in a staging table that tracks all updates. – Triggers are used to identify changes that are needed by the warehouse. or DELETE occurs on a table. – Proprietary format. – A different trigger might be run on key updates so that the data warehouse nightly process would know what data has changed. 157 Triggers Triggers • Triggers allow DBA’s to specify that when an “event” such as an INSERT.

Y) 159 Real-Life Trigger Example Real-Life Trigger Example • OLTP/DSS Data . name.Employee table: –Employee (ssn. average salary). salary) • DW Data . Y) into a “STAGING” area STEP 3 Nightly Process STEP 1 STEP 4 Nightly Process inserts values (X.Example of a Trigger Example of a Trigger STAGING STEP 2 A LE TAB Values (X. Y TRIGGER inserts values (X. Y) are inserted When values are inserted. Y) into the Data Warehouse DATA WAREHOUSE INSERT into TABLE A VALUES (X. we need to do an insert into the EmployeeStatistics table.Summary table: –EmployeeStatistics (total number employees. sets off the TRIGGER X. – Shown on the next page 160 80 . Y) A LE TAB Values (X. total salary paid. • When a row is inserted in the employee table.

300) RESULTS (1 ROW(S) AFFECTED) INSERT INTO EMPLOYEE VALUES (2.'Mike'.---------2 700.00 EmployeeStatistics NoEmployee TotSalary ---------.Insert Trigger Example Insert Trigger Example CREATE TRIGGER EmployeeInsertTrigger ON Employee FOR INSERT AS BEGIN UPDATE EmployeeStatistics SET NoEmployee = NoEmployee + (SELECT COUNT(*) FROM INSERTED) UPDATE EmployeeStatistics SET TotSalary = TotSalary + (SELECT SUM(Salary) FROM INSERTED) UPDATE EmployeeStatistics SET AvgSalary = TotSalary / NoEmployee END 161 Insert Trigger in Action Insert Trigger in Action COMMANDS INSERT INTO EMPLOYEE VALUES (1.00 2 Mike 400.-------------------------1 John 300.00 162 81 .00 SELECT * FROM EMPLOYEESTATISTICS AvgSalary --------350. 400) (1 ROW(S) AFFECTED) SELECT * FROM EMPLOYEE Employee EmpId Name Salary -----. 'John'.

(SELECT COUNT(*) FROM DELETED) UPDATE EmployeeStatistics SET TotSalary = TotSalary .Delete Trigger Example Delete Trigger Example CREATE TRIGGER EmployeeDeleteTrigger ON Employee FOR DELETE AS BEGIN DECLARE @numberEmployee int UPDATE EmployeeStatistics SET NoEmployee = NoEmployee .(SELECT SUM(Salary) FROM DELETED) SELECT @numberEmployee = NoEmployee FROM EmployeeStatistics IF @numberEmployee > 0 BEGIN UPDATE EmployeeStatistics SET AvgSalary = TotSalary / NoEmployee End ELSE UPDATE EmployeeStatistics SET AvgSalary = 0.0 END 163 Update Trigger Example Update Trigger Example CREATE TRIGGER EmployeeUpdateTrigger ON Employee FOR UPDATE AS BEGIN IF UPDATE (Salary) UPDATE EmployeeStatistics SET TotSalary = TotSalary (SELECT SUM(Salary) FROM DELETED) + (SELECT SUM(Salary) FROM INSERTED) UPDATE EmployeeStatistics SET AvgSalary = TotSalary / NoEmployee END 164 82 .

add it! OLTP Hank John Mike Sam DATA WAREHOUSE Hank John RE PA Mike M CO ADD THE DIFFERENCES 166 83 .Summary of Using Triggers Summary of Using Triggers • Pro – Only needed for tables whose data is going to go to the DW • Con – Additional work needed to create detailed triggers – Non-trivial to generate a trigger to implement appropriate action – May not be acceptable for commercial software on source system 165 Other Ways to Determine What Has Changed Other Ways to Determine What Has Changed • There are other manual ways of detecting the change and doing DW updates – Look at each row of OLTP and the data in the warehouse – Compare the differences between the two files. if the data is not in the warehouse.

Manually Identifying What Has Changed Manually Identifying What Has Changed • Pro – Flexible • Con – Very expensive – Could take a long time 167 Summary Summary • Recovery Logs • Triggers • Manual Detection 168 84 .

6760 www.553.Data Warehouse Design (slides in this section are used courtesy of Carrig Emerging Technology Ph: 410.c a r r i g e t.ER Diagrams 3) Design Normalization 4) Star Schema Design 170 85 . c o m ) 169 Data Warehouse Design Data Warehouse Design 1) Overview 2) Describing a Design .

product. • Relationships – How entities interact. the most prevalent is the ER (Entity-Relationship) Diagram • Entities – Things that occur in the real world. part.usually verbs – Types of relationships – 1-1 – 1-Many – Many-1 – Many-Many 172 86 . example: one employee may attend many colleges -. employee.g. etc. usually nouns e.Overview Overview • How to describe a design – Entity Relationship (ER) Diagram • Types of Designs – Normalized – Star Schema – Snowflake 171 Describing a Design Describing a Design • Different techniques exist..

MANY 173 Normalized Design Normalized Design • Methodology – All 1-1 relationships are placed in a single table. The relationship is represented in the linking table by referencing keys in the two tables that represent each entity in the relationship. by definition.Examples of Relationships Examples of Relationships 1-1 1-MANY MANY-1 MANY. • Checking the design – In a Normalized Design. Each normal form (NF) builds on the previous one so that a table in 2NF is. there are many different normalized forms. – 1NF – 2NF – 3NF 174 87 . in 1NF. – Many-many relationships require two tables that store the singlevalued relationships and one linking table that indicates how the entities are related.

The product is purchased by a customer at a certain time. SUPPLIER S# 1 2 SNAME SEARS OFFICE DEPOT PARTS P# 1 2 PNAME HAMMERS NAILS SP S# 1 1 2 2 P# 1 2 1 2 175 Normalized Design: Example Normalized Design: Example • A store sells a product which is supplied by a given vendor.Dealing With Many-Many Relationships Dealing With Many-Many Relationships • For Many-Many – Two 1-1 Tables (SUPPLIER. Parts are the 1-1. Store – Relationships: Customer buys Product – Product is located in Store – Product is supplied By a Vendor VENDOR CUSTOMER PRODUCT STORE BUYS IS-LOCATED-IN 176 88 . PARTS) – One linking table (SP) – Ex: Suppliers. SP is the linking table that says who sells what parts. Product. – Entities: Customer.

if the primary key changes. – Enables users to quickly check a design and make sure there are no glaring holes in the design. delete. – 3NF – No transitive dependencies -.i.e. and update anomalies caused by bad designs. – 1NF – All “cells” are atomic -.e. all keys are completely dependent on the primary key. all other columns change. 177 Overview of Normalized Design Overview of Normalized Design • Pro – Relatively easy to change • Con – Queries can involve numerous joins – The massive number of tables and links between tables makes it hard for customers to build their own queries 178 89 . all non-key columns are affected.i.i. If the primary key changes.e.Checking a Normalized Design Checking a Normalized Design • Normalization – Used to reduce data insertion. each entry in a column contains only one value – 2NF – All non-key values are functionally dependent upon the entire primary key -.

– Usually some event creates a real fact. etc. etc. sale) surrounded by dimension tables (i.e. Think of a dimension as a way to slice the data. by customer. – Ex: by time. time.g.Star Schema Star Schema • Methodology – Single fact table in the middle describing a key event (e. employee) D = DIMENSIONS D1 D2 FACT D5 D4 D3 179 Star Schema: Methodology Star Schema: Methodology • Identify a key fact that occurs. • Identify all the dimensions of the data being used. by product. • Drill down operations are very well supported 180 90 . location. Selling a product in a store on Wednesday. patient visiting a hospital.

) Star Schema: Example (cont.) Customer Time Sale Store Product Price SALE SALE ID CUST.Star Schema: Example Star Schema: Example • A store sells a product which is supplied by a given vendor. • Fact – CustomerPurchase • Dimensions are – Customer – Product – Time – Vendor 181 Star Schema: Example (cont. ID 3 NAME 7 PHONE 4 $3.00 Buys Apples 4/24/99 Has Big Car 3 TIME FRED DAY 24 4 1234 Y Y YEAR 99 MONTH QTR 2Q 182 91 . ID STORE ID PROD. The product is purchased by a customer at a certain time. ID PRICE TIME 1 CUSTOMER CUST.

Star Schema: Overview Star Schema: Overview • Pro – Easy for users to navigate and understand • Con – Performance – Can end up with one monster fact table. millions of rows – Flexibility – Not as easy for customers to change the design 183 Snowflake Schema Snowflake Schema • Several stars can be connected to form a snowflake MARKETING Distribution Ad Direct Mail Price SALES Marketing Revenue Sales Location PRODUCT Parts Manufacturing Sale Price Vendor Make Chips Product Cost Price Labor 184 92 .

Summary Summary • Two basic types of design – Star Schema – Normalized • Many Data Warehouse vendors sell products built specifically for the star schema • Some data warehouses insist that normalization is the way to build the data warehouse. c o m ) 186 93 . 185 Building a Data Warehouse (slides in this section are used courtesy of Carrig Emerging Technology Ph: 410.c a r r i g e t.553.6760 www.

Building a Data Warehouse Building a Data Warehouse 1) Top Down Approaches 2) Enterprise Data Model Approach 3) "Let Data Users Decide" 4) "Let Data Warehouse Builders Decide" 5) "Let Senior Management Decide" 6) Bottom Up Approach 187 Building the Data Warehouse Building the Data Warehouse • How to decide what data goes into the data warehouse? • Methods: – Top Down – Using Enterprise Data Models – "Let data users decide" approach – "Let data warehouse builders decide" approach – "Let senior management decide" approach – Bottom Up – Combine data marts into a data warehouse 188 94 .

– Model key processes.Using Enterprise Data Models Using Enterprise Data Models • Use the Enterprise Data Model to decide what data goes into the data warehouse. • Put data in the warehouse based on the enterprise data model. This approach says let the business decide. 189 An Enterprise Data Model Example An Enterprise Data Model Example MAKE CHIPS PUT IN BAGS SELL CHIPS COUNT $$ BUY MORE POTATOES CHIP SUPPLIERS CHIP RECIPES INGREDIANTS 190 95 . – Identify key data used by these processes in an enterprise data model -.might be a giant Entity-Relationship diagram.

no chance of leaving key data out. – Also."Enterprise Data Model" Approach "Enterprise Data Model" Approach • Pro – All inclusive -.if the business is common enough the packaged EDM might be very close and then you just have to modify it to fit your business. – The data users deciding the data warehouse data and design will pay for it as well. – If the business model changes. you can charge users who query the data as well. SOURCE USERS DATA WAREHOUSE 192 96 . 191 "Let Data Users Decide" "Let Data Users Decide" • Let the users of the data warehouse choose what data will go into the warehouse. • Ways of Avoiding the Con – In some cases you can buy an EDM -. • Con – Very difficult to build an EDM. you may have to rebuild the Enterprise Data Model and the data warehouse.

try to determine if data is really corporate. – Users who need the data may not use the DW because of budget concerns. – Users may not place important data in the warehouse because their budget is small."Let Data Users Decide": An Example "Let Data Users Decide": An Example DATA WAREHOUSE DATA DATA demographics DATA budget trends Advertising Ethnic group ? education Age spending Revenue ? ? MARKETING HUMAN RESOURCES FINANCE 193 "Let Data Users Decide" Approach "Let Data Users Decide" Approach • Pro – Reduces budget problems – Users know best! • Con – Requires marketing – Could end up with data in the warehouse that is meaningless to the people who run the place. • Ways of Mitigating the Con – Do not just take money -. 194 97 .

Pay As You Go Warehouse Analogy Pay As You Go Warehouse Analogy I-495 195 "Let Data Warehouse Builders Decide" "Let Data Warehouse Builders Decide" • The technical staff who is building the warehouse decides what data gets put in the warehouse. LETS PUT INFORMATION ON HOW TO BUILD VIRUSES IN THE DATA WAREHOUSE DATA WAREHOUSE 196 98 .

• Identify the key questions on senior management’s mind and get the data to answer these questions. 198 99 ."Let Data Warehouse Builders Decide" "Let Data Warehouse Builders Decide" Approach Approach • Pro – Very easy to design – Does not take much time – Do not have to deal with users • Con – Could easily result in data DUMP not data warehouse • Ways to mitigate the con – Talk to lots of users to help you guess what should go in the DW 197 “Let Senior Management Decide” “Let Senior Management Decide” • The senior management decides what data goes into the warehouse. • Asking the senior management is the safest way to build a data warehouse.

• Ways to mitigate the con – Do your homework before talking to the senior management -.talk to the aides of senior management to find out what is on their mind. • Combine data marts into a data warehouse.if you do not move quickly senior management will become very angry with the DW. DATA WAREHOUSE DATA MART 25 YARDS DATA MART 50 METERS DATA MART 200 CM OLTP APP OLTP APP OLTP APP 200 100 .you will have to only get a few questions at a time – This dramatically increases visibility .“Let Senior Management Decide” Approach “Let Senior Management Decide” Approach • Pro – Ensures executive support for the project • Con – Senior management does not have much time for this -. – Allocate resources so you can plan to move very quickly once you hear from the senior management. 199 Bottom-Up Approach Bottom-Up Approach • Move data from existing OLTP Applications to data marts.

• Con – Could end up with a bunch of stove pipe data marts.Bottom-Up Approach Bottom-Up Approach • Pro – Data marts are much easier to build than full-fledged DW. • Ways to mitigate the con – Develop standards for data when building the data marts so that you can glue data from different data marts together. 201 Recommendations for an Approach Recommendations for an Approach "Let senior management decide" 202 101 .

553.6760 www.c a r r i g e t.User Interface to the Data Warehouse (slides in this section are used courtesy of Carrig Emerging Technology Ph: 410. c o m ) 203 User Interface to the Data Warehouse User Interface to the Data Warehouse 1) Introduction 2) Types of Users 3) Functions Users Want to Do 4) Approaches to Building a User Interface 5) Hand Built 6) Class Libraries 7) OLAP Tools 8) Types of User Interfaces 204 102 .

– It is critical to identify the key users. you need to identify their functional requirements. – Data in a data warehouse does nothing if users cannot access it – Users do not want to learn SQL to drive DW applications Finance OLTP Application Inventory OLTP Application Sales OLTP Application DATA WAREHOUSE Finance OLTP Data Inventory OLTP Data Sales OLTP Data USER INTERFACE 205 Building User Interfaces Building User Interfaces • DW applications have different types of users with different functionality requirements.Introduction Introduction • A User Interface (UI) is a front end application designed for the user that presents information in a simplified manner. • There are three main approaches to building UI’s – Build your own entirely – Use commercial Class Libraries – Using OLAP Tools 206 103 . – Once you do this.

Types of Users Types of Users
CEO Executive Marketing Analysts Everyone Executive Sales Analysts Everyone Executive Finance Analysts Everyone
207

Types of Users (cont.) Types of Users (cont.)
• Executives
– – – – People who run the place Need answers quickly May not be very technical Expect UI to get them what they want quickly and efficiently without any need for special training

• Analysts
– Have time to really analyze data and think about it – May have strong statistical and IT background (i.e. Power user of Excel) – Expect UI to have many complex features, and provide the ability to generate new queries and perform statistical analysis of the data.
208

104

Types of Users (cont.) Types of Users (cont.)
• Regular User
– All other users – Just need some simple answers to simple questions like “What is Hank’s phone number) – Expect UI to be simplistic, easy to understand, and provide access to basic information.

209

Subject Matter Experts Expect Subject Matter Experts Expect
• Query data in the data warehouse • Trend analysis
– “show me how much money we have spent on computers in the last four years”
Trend

Sales

1995

1999

• Benchmark to competitors
– “what are all our competitors charging for product X”
210

105

Subject Matter Experts Expect (cont.) Subject Matter Experts Expect (cont.)
• Drill Down
– “on that chart you just showed me, I noticed that revenue was down in Region #4. Please drill down and show me the breakdown of each area in Region #4.”
DRILL DOWN WAL-MART
20 15 10 5 0 1 2 3 4 REGIONS
DRILL DOWN Revenue

REVENUE

Y Values X Values

MD

DC VA Region 4

211

Approaches to Building User Interfaces Approaches to Building User Interfaces
• Hand-Built
– Write all of your own code

• Use Class Libraries
– Use an object oriented approach and buy the CLASS libraries that do all the hard work

• OLAP
– Use an On-Line Analytical Processing package to build user interfaces for you.

212

106

JAVA DATA WAREHOUSE DBMS Commercial Off The Shelf • Class Libraries USER INTERFACE GRAP HIC CLAS S S LIBR ARY (COTS) OLAP CLASS LIBRARY Hand Built USER E FAC INTER SS CLA RY LIBRA 213 Architecture of User Interfaces (cont.) • Hand Built USER INTERFACE i.) Architecture of User Interfaces (cont.e.Architecture of User Interfaces (cont.) Architecture of User Interfaces (cont.) • OLAP YEAR E OR ST REGION Result Cube Commercial Off The Shelf (COTS) REVENUE USER INTERFACE DBMS 214 107 .

if the class library does not do what you want it to do you have to – Find a new class library – Live without the functionality – Can take a while to find the class library you need and learn how to interface to it 216 108 . • Pros – Very flexible • Cons – Could take a long time to develop – Requires substantial resources – May need lots of testing and debugging 215 Using Class Libraries to Build User Interfaces Using Class Libraries to Build User Interfaces • Write initial user dialog yourself and call class libraries for the hard part (graphics and data access functionality).Hand-Building User Interfaces Hand-Building User Interfaces • Write all the code yourself – Requires many design documents. coding and testing for all of the code components.avoid doing a lot of coding yourself • Con – Not as flexible -. • Pro – Many class libraries available -.

• Three types multi-dimensional OLAP – – – – Relational OLAP (ROLAP) Multi-dimensional (MOLAP) Hybrid (HOLOP) Distributed (DOLAP) 217 Summary of Tools for UI Development of DW Summary of Tools for UI Development of DW • Tools that may be used include: – Development of in-house software – Do it all yourself – Use Class Libraries – OLAP – ROLAP – MOLAP – HOLAP – DOLAP • Different tools or techniques may be useful depending upon what kind of user interface is being developed.Using OLAP Tools to Build User Interfaces Using OLAP Tools to Build User Interfaces • Many different OLAP tools – – – – Need to survey an OLAP tool Buy an OLAP tool Install it If it does not match all requirements some code may be needed to communicate with the OLAP tool. – Executive Information Systems – Analytical Systems – Enterprise Information Systems 218 109 .

220 110 .Types of User Interfaces Types of User Interfaces • Executive Information System – Developed for the person who runs the place • Analytical System – Developed for business analysts • Enterprise Information System – Developed for users throughout the organization EXECUTIVE INFORMATION SYSTEM CEO Executive Executive Executive Marketing Analysts ENTERPRISE INFORMATION SYSTEM Sales Analysts Finance Analysts ANALYTICAL SYSTEM Everyone Everyone Everyone 219 Executive Information System Executive Information System • The Executive IS is developed specifically for people who run the organization. but purchasing a class library can help lower the development cost. • Development process: – No clean life cycle – Prototype constantly. – May just want to use tools that allow development of a subscription service in which users may “Subscribe” to a few canned reports. Usually have to guess at what executives will want to see – Show executives let them come up with ideas for revisions – Drill down functionality required • Tools – Frequently hand-built.

– Simpler than Executive IS as it does not require drill down functionality. • Tools – Place some simple. – More complex interface is acceptable – Users may be required to know some SQL knowledge • Tools: – OLAP Tools are frequently used to build the interface 221 Enterprise Information System Enterprise Information System • Enterprise IS is written for the general user to retrieve simple.Analytical System Analytical System • Analytical systems are user interfaces developed for business analysts in an organization. key information on a few screens and control access and then deploy. key information. • Development process: – Allow users to drag-and-drop data around to further the analysis of this data. • Development process: – Frequently developed in-house – So many users around that you really cannot pick a few and ask what they need. 222 111 .

Summary of Types of User Interfaces Summary of Types of User Interfaces • Executive Information System – For the senior executives – Use in-house development or in -house development augmented by class libraries • Analytical System – OLAP may make sense here as the interface is more complicated. but OLAP has drawbacks due to: – Data sparseness – No well accepted query language • Enterprise Information System – Much simpler than executive system – Good candidate for in-house development 223 112 .

Sign up to vote on this title
UsefulNot useful