You are on page 1of 28

Lecture 1,2 Data Mining

An introduction to Data, Metadata DBMS


Definitions

• Data: stored representations of meaningful


objects and events
– Structured: numbers, text, dates
– Unstructured: images, video, documents
• Database: organized collection of logically
related data
• Information: data processed to increase
knowledge of a person using the data
• Metadata: data that describes the properties
and context of user data
2
SOLUTION: The DATABASE Approach

• Central repository of shared data


• Data is managed by a controlling
agent
• Stored in a standardized,
convenient form

3
Database Management System
 A software system that is used to create, maintain, and provide
controlled access to user databases

Order Filing
System

Invoicing Central database


DBMS
System
Contains employee,
order, inventory,
Payroll pricing, and
System customer data

DBMS manages data resources like an operating system


manages hardware resources 4
Figure 1-3 Comparison of enterprise and project level data models

Relationship

5
Figure 1-5 Components of the Database Environment

6
Components of the
Database Environment
• CASE Tools–computer-aided software engineering
• Repository–centralized storehouse of metadata
• Database Management System (DBMS) –
software for managing the database
• Database–storehouse of the data
• Application Programs–software using the data
• User Interface–text and graphical displays to users
• Data/Database Administrators–personnel
responsible for maintaining the database
• System Developers–personnel responsible for
designing databases and software
• End Users–people who use the applications and
databases
7
The Range of Database
Applications
• Personal databases
• Two-tier Client/Server databases
• Multitier Client/Server databases
• Enterprise applications
– Enterprise resource planning (ERP) systems
– Data warehousing implementations

8
9
Figure 1-8a Evolution of database technologies

10
• Data Warehouse an introduction

– Is a repository of selected operational data, which can


successfully answer any complex query. It is at the centre of
decision support system.
– Larger amounts of data are required to be processed faster
and faster.
– Enables easy organization and maintenance of large data in
additional to fast retrieval and analysis.
– Mostly on Mainframe Servers and Clouds

11
Data Marts
 From a data warehouse, data flows to various departments for
their customized DSS usage. These individual departmental
components are called data marts.
 Data mart is a subset of a data warehouse and is much more
popular than data warehouse.

Reasons ?
 As the data warehouse grows, it becomes more complex.
 The cost of processing the data also increases as the volume increases.
 The data becomes harder to customize.
 When data warehouse is small, the DSS can easily summarize.
 Software available to access or analyze large quantity of data may not be
as easy as the software that will be able to process small amount of data.
 The department that owns the data mart can easily customize the data.
 the processing load and overload is also very limited in data mart.
 the Data mart can be fed by data from external sources  .

12
Loading a Data Marts
 The data mart is loaded with data from a data warehouse by
means of a
load program.
 Main considerations for load program are:
• Frequency
• Total or partial refreshment
• Customization of data from the warehouse (selection)
• Merging of data
• Summarization
• Efficiency
• Integrity of data
• Data relationship

13
Metadata for Data Mart
 MetaData describes the details about the data in a data
warehouse or in data mart (properties).
 Following are the components of metadata
• Description of source of the data.
• Description of customization that may have taken place as
the data passes from data warehouse into data mart.
• Information regarding Data Mart itself (its tables, relationship
etc..).
• Definitions of all types
 Meta data is created and updated from the load programs that
move the data from data warehouse to data mart.
 The linkage and relationship between metadata of the
warehouse and metadata of the mart have to be well
established and well understood by the manager or analyst
using metadata.
14
Data Model for a Data Mart
 For a large data mart which may have some processing
involved, a formal data model is required.

 For a simple small data mart which may not have any
processing involved, no data model is required.

15
Maintenance of a Data Mart
 A periodic maintenance of a data mart means loading,
refreshing and purging that data in it.
• daily basis (weather forecast)
• weekly basis (prices)
• monthly basis
• quarterly basis
• yearly basis
• after every 10 years (census)

 Data mart is read periodically and some old data is selected


for purging.
 The criteria for purging depends on data, time and periodicity.

16
Nature of Data in a Data Mart
 Data can be:
• Detailed
• A summary
• Ad hoc
Software Components for a Data Mart
 Software's that are found with a data
mart includes:
• DBMS
• Access and analysis
• Software for purging
• Software for metadata management
External Data
• First, the external data, if required to be used
by more than one data mart, shall be placed
in the data warehouse itself and then
subsequently to moved to the data marts
required.
• This avoid duplication as its centralized.
• The details in addition to the data are also
required to be stored.
Performance
 Good performance can be achieved in
a data mart environment by:
– Extensive indexing
– Using star joins
– Limiting the volume of the data
– Creating array of data
– Creating profile record Star joins
– Creating pre-joined tables
Star Schema for Multidimensional View

- The relationships of individual data entities in the data


warehouse have to be understood by the data model
designer for the data warehouse.
- Such a modeling exercise may lead to a star schema.
- The star schema so developed will effectively handle
data navigation difficulties and performance issues.

Dimensions table Dimensions table


Fixed time perio Fact table Market requireme
d nt
Sales & facts

Region products

Star scheme- sales analysis application


Star Schema for Multidimensional View cont...

Sales (store id,item id,customer id, price)


Store (dimension) Customer (dimension)
Sid Sales (fact) Cid

Sid Iid Cid Price

Item (dimension)
Iid
Star Schema for Multidimensional View cont...

- The star schema provides multidimensional view


- The star schema will not be effective in RDBMS
Star Schema OR Snow Flake scheme

- As the data warehouse grown in complexity, the


diversity of subjects grows.
- In such a situation, the star schema will be inadequate.
- This can be enhanced by adding additional dimensions
which increase the scope of attributed in the star
schema table.
- Thus better technique is multifactor star schema or
snow flake.
Monitoring Requirements for a Data Mart
 Periodic Monitoring is required on data mart
behavior.
 Data usage tracking:
• What data is being accessed
• Which users are active
• What is the quantity of data accessed
• What are the usage timings

 Data content tracking:


• What are the actual contents of the data mart
• What is the best way of accessing
• Is there any bad data in the data mart
• How and how much of the data mart is growing
Security in Data mart

 The Data Mart Administrator should make


necessary security arrangements such as:
– Firewalls
– Encryption and decryption
-Its used for enforcing security in distributed systems, data
warehouse etc...
-Symmetric and Asymmetric key.
Jazak Allah !!

You might also like