Professional Documents
Culture Documents
References
[1] Data Mining Concepts and Techniques Jiawei Han and Micheline Kamber [2] http://www.daneil-lemire.com [3] http://www.kalmstrom.nu
sales.
o Focused on the modeling and analysis of data for decision makers,
by excluding data that are not useful in the decision support process.
attribute measures, etc. among different data sources When data is moved to the warehouse, it is converted.
o Eg: Sales data may be on RDB, customer information in flat files.
element
environment
o Operational update of data does not occur in the data warehouse
environment
Does not require transaction processing, recovery, and
Heterogeneous Databases
o Consists of a set of interconnected, autonomous databases. o Objects in one database may differ from objects in other
Operational DBMS
o They consist of tables with a set of attributes and stores a o o o o
large set of tuples. They use the Entity-Relationship (ER) data model. They are used to store transactional data. They contain the most current information. Thus known as Online Transaction Processing (OLTP) systems.
10
o Data contents
current, detailed vs. historical, consolidated
o Database design
ER + application vs. star + subject
o View
current, local vs. evolutionary, integrated
o Access patterns
update vs. read-only but complex queries
11
operational tasks.
o Decision support requires historical data which operational
heterogeneous sources.
o Solution
13
Data cube models n-D data, defined by dimensions and facts. Dimensions: They are entities with respect to which an organization wants to keep records such as items (item_name). Facts: It is a subject of decision oriented analysis such as dollars_sold or units_sold. Facts are numerical measures. Quantities by which we want to analyze relationship between dimensions. Contains key to each of the related dimension tables.
Industry Region
Year
Product
City
Office
Month
Week Day
Country
TV PC VCR sum
Date
0-D(apex) cuboid
1-D cuboids
time,location
item,location
location,supplier
time,supplier
2-D cuboids
item,supplier
time,location,supplier
3-D cuboids
time,item,supplier
item,location,supplier
4-D(base) cuboid
time_key item_key branch_key location_key units_sold dollars_sold avg_sales Sales Fact Table
item
item_key item_name brand type supplier_type
branch
branch_key branch_name branch_type
location
location_key street city state_or_province country
Time_Dim
TimeKey TheDate ...
Sales_Fact
TimeKey EmployeeKey ProductKey CustomerKey ShipperKey Sales Amount Unit Sales ...
Product_Dim
ProductKey ProductID ...
Shipper_Dim
ShipperKey ShipperID ...
Customer_Dim
CustomerKey CustomerID ...
2. Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake. item time
time_key day day_of_the_week month quarter year
location
location_key street city_key
branch
branch_key branch_name branch_type
units_sold
dollars_sold avg_sales
Snowflakes
are conglomerations of frozen ice crystals which fall through the Earth's atmosphere. They begin as two snow crystals which develop when microscopic supercooled cloud droplets freeze.
3. Fact Constellation: Multiple facts tables share dimension tables, viewed as collection of stars, therefore called galaxy schema or fact constellation.
qq
time
time_key day day_of_the_week month quarter year
item
time_key
item_key branch_key
item_key item_name brand type supplier_type
time_key
item_key shipper_key
location
location_key street city province_or_state country
branch
branch_key branch_name branch_type
from_location
to_location dollars_cost units_shipped shipper
shipper_key shipper_name location_key shipper_type
THANKS