# Batch :E8 Case study on Big University

Suppose that a data warehouse for Big University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg grade measure stores the actual course grade of the student. At higher conceptual levels, avg grade stores the average grade for the given combination. (a) Draw a snowflake schema diagram for the data warehouse. (b) Starting with the base cuboid [student; course; semester; instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each Big University student. (c) What is a staging area? Do we need it? What is the purpose of a staging area?
Problem 4: (25 points) Do problem 3.4 on page 152 Suppose that a data warehouse for Big University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. (a) Draw a snowflake schema diagram for the data warehouse. (b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each Big University student. (c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)? Solution: (a)

(b)

(a) Enumerate three classes of schemas that are popularly used for modeling data warehouses. slice for year = “2004” 3. Sum(charge) From fee Where year = 2004 Group by doctor . roll up on patient from individual patient to all 4. what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004? (d) To obtain the same list. roll-up on course from (course_key) to major 2. what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004? 1. roll-up on student from (student_key) to university 3. student with department =”CS” and university=”Big University” 4. Problem 3: (25 points) Do problem 3. Fact constellations: multiple fact tables share dimension tables. roll up from day to month to year 2. year. instructor] 1. (b) Draw a schema diagram for the above data warehouse using one of the schema classes listed in (a). Suppose that a data warehouse consists of the three dimensions time. (b) As figures below (c) Starting with the base cuboid [day. course. (c) Starting with the base cuboid [day. slice for patient = “all” 4.Starting with the base cuboid [student. semester. charge). viewed as a collection of stars. patient. therefore called galaxy schema or fact constellation. count. Solution: (a) star schema: a fact table in the middle connected to a set of dimension tables snowflake schema: a refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables. where charge is the fee that a doctor charges a patient for a visit. doctor. get the list of total fee collected by each doctor in 2004 (d) Select doctor. hospital.3 on page 152. Drill-down on student from university to student name (c) The cube will contain 54=625 cuboids. patient]. doctor. Dice on course. patient]. and patient. write an SQL query assuming the data are stored in a relational database with the schema fee (day. month. doctor. doctor. forming a shape similar to snowflake. and the two measures count and charge.

and instructor combination). Suppose that a data warehouse for Big University consists of the following four dimensions: student.. namely. semester. the avg_ grade measure stores the actual course grade of the student. course. and instructor. along with two measures: count and avg_grade . The schema contains a central fact table for Big –University that contains keys to each of the four dimensions.course and instructor. .g. avg _grade stores the average grade for the given combination. student . semester. P116 答： Big university are considered along four dimensions. At higher conceptual levels. course. and two measures count and avg _grade. (a) Draw a snowflake schema diagram for the data warehouse. for a given student. When at the lowest conceptual level (e. semester.4.

what specific OLAP operations (e. resulting in a subcube.we use the following specific OLAP operations in order to list the average grade of CS courses for each Big University student. the total number of cuboids that can be generated is: 54=625 www.scribd. either by climbing up a concept hierarchy for a dimension or by dimension reduction.semester dimension table semester _key quarter year Big _university fact table semester _key course _key student _key instructor _key count avg _grade student dimension table student _key student _ No. the total number of cuboids that can be generated (including the cuboids generated by climbing up the hierarchies along each dimension) is Total number of cuboids=∏ ( Li + 1).scribd. such as “student < major <status < university < all”.g. instructor].com/doc/43505352/Chapter-3 .4 Snowflake schema of a data warehouse for Big _university (b) Starting with the base cuboid [student. he cube has 4 dimensions and each dimension has 5 levels (including all). This hierarchy was defined as the total order “quarter<year. One is added to Li in Equation to include the virtual top level. instructor].com/doc/6756591/Snowflake-Gau www. Roll-up: The roll-up operation performs aggregation on a data cube. (c) If each dimension has five levels (including all).” Slice and dice: The slice operation performs a selection on one dimension of the given cube. course. 答： Starting with the base cuboid [student. semester. course. semester. So .. where Li is the number of levels associated with dimension i =1 n i. roll-up from semester to year) should one perform in order to list the average grade of CS courses for each Big University student. name age sex class major _key major dimension table major _key major _type course dimension table course _key course _number course _name Property credit instructor dimension table instructor _key name age office _key office dimension table office _key Office _telephone office _address Figure3. all. how many cuboids will this cube contain (including the base and apex cuboids)? 答： For an n-dimensional data cube.

In short. which in fact can severely hamper the performance of the OLTP system. The Data Warehouse Staging Area is temporary location where data from source systems is copied. data processing cycles. only remains around temporarily). before loading the data into warehouse. Staging tables are connected to work area or fact tables. A staging area is mainly required in a Data Warehousing Architecture for timing reasons. and perform data cleansing and merging .In the absence of a staging area. For many businesses it is feasible to use ETL to copy data directly from operational databases into the Data Warehouse. This is the primary reason for the existence of a staging area. We basically need staging area to hold the data . remains around for a long period) or transient (i. Similarly. but this would not be feasible for "customer" data in a Chicago database. it might be feasible to extract "customer" data from a database in Singapore at noon eastern standard time. all required data must be available before data can be integrated into the Data Warehouse.e. Due to varying business cycles. however. daily extracts might not be suitable for financial data that requires a month-end reconciliation process.What is a staging area?Do we need it?What is the purpose of a staging area? Staging area is place where you hold temporary tables on data warehouse server. it might be reasonable to extract sales data on a daily basis. it also offers a platform for carrying out data cleansing. hardware and network resource limitations and geographical factors. it is not feasible to extract all the data from all Operational databases at exactly the same time. In addition. the data load will have to go from the OLTP system to the OLAP system directly. Data in the Data Warehouse can be either persistent (i.e. . For example. Not all business require a Data Warehouse Staging Area.