You are on page 1of 19

METADATA and METADATA IN WAREHOUSING

Topic 4.7, Page 62 Chapter -9 Page 139-148 Data Warehousing in Real World - Sam Anahory - Dennis Murray PRESENTER

VINAY ARORA I.T. 7th sem (c) 81005113089


22nd AUG 2011

META-DATA
from Greek MEANS "after", "beyond", "with", "adjacent", "self"
A piece of information. An assumption or premise from which inferences may be drawn

DEFINITIONS
Structured data about data. Increasingly this term refers to any data used to aid the identification, description and location of networked electronic resources. swdb.berkeley.edu/glossary.html (Statewise Database, University of California) data about the content, quality, condition, and other characteristics of data. www.fgdc.gov/metadata/csdgm/glossary.html (Federal Geographic Data Committee) A set of data that describes and gives information about other data (Wikipedia) Metadata is generally defined as 'descriptive information about information' and refers to any data used to support the identification, description and location of an information object, such as a document. Simply put, metadata is the collection of labels that describe a piece of information. www.namahn.com/resources/documents/note-metadata.pdf (Human-centered Design Consultancy in Belgium) Metadata is data about data. (Anahory)

Purpose of Meta-Data
What they serve, defines the purpose of metadata:
Semantic analysis
data explaining the content of the information object: title, subject (or subject categories, taxonomies, ontologies), keywords, intended audience, content rating, and so forth.

administration
data used for managing the information object: author(s) of the resource, reviewer(s), the version number, date to be reviewed, property rights, and so forth

access and publishing


data that can be extracted directly from an information object: file name, size, extension, creation date and so forth.

Properties of Metadata
Metadata can be generated automatically, or created by humans. They can be queried by a user, or they can be used by software agents in service of a user. Can be associated with resource in following ways: They can be embedded directly in the information object: e.g. HTML metatags in a web page. They can be a separate entity linked to or from the object they describe. They can be stored in a remote database. The record in the database may either have been directly created within the database or extracted from another source, such as a web page.

Meta-Data in Warehousing
Data warehouses are designed to manage and store the data whereas the Business Intelligence (BI) focuses on the usage of data to facilitate reporting and analysis. The purpose of a data warehouse is to house
standardized, structured, consistent, integrated, correct, cleansed and timely data, extracted from various operational systems in an organization.

Ralph Kimball* describes metadata as the DNA of the data warehouse as metadata defines the elements of the data warehouse and how they work together. * Done Ph.D. in 1972 from Stanford University in electrical engineering (specializing in man-machine systems)

Categories of meta-data
TECHNICAL Technical metadata defines the data model and the way it is displayed for the users, with the reports, schedules, distribution lists and user security rights.

(Tables, fields, data types, indexes and partitions in the relational engine, and databases, dimensions, measures, and data mining models.)

BUSINESS tells you what data you have, where it comes from, what it means and what its relationship is to other data in the data warehouse PROCESS describe the results of various operations in the data warehouse.
(includes start time, end time, CPU seconds used, disk reads, disk writes and rows processed)

Meta-Data in warehousing is used for:I. Data Transformation and Load


Used to map data sources to common view of information within the data warehouse.
Source field
Unique identifier Name Type Location System object

Destination
Unique identifier Name Type Table Name

Transformations
Name Language Module Syntax

Cont.

II.

Data Management

Used to automate the summary tables Tables


Columns Name Type

Indexes
Columns Name Type

Constraints
Name Type Table Columns

Cont.

III.

Query Generation
Query
Tables accessed Columns accessed Name Reference Identifier Aggregate Functions Used Column Name Aggregate Function Sort Criteria Column Name Sort direction Syntax Resources Disk Read / Write CPU Memory User

Used to direct a query to the most appropriate data source and give information about the query executed.

Live Example
NTFS Architecture

NTFS Master File Table


1. 2. 3. 4. 5. Master file table $MFT. The root folder . Boot sector $Boot (located at the beginning of partition) Bad cluster file $BadClus Security file $Secure NTFS extension file $Extend, that is used for future use.

MFT Records
Small Files (<900B) are contained completely in the MFT entry.

MFT Records
Folders contain index data. Small folders reside within the MFT record Larger folders have an index structure to other data blocks. They use a B-tree structure.

MFT Attribute Layout


Attributes can be resident or non-resident. Beginning is always the same:
0x00 Attribute Type Identifier 0x04 Length of Attribute 0x08 non-resident flag 0x09 length of name 0x0a offset to name 0x0c flags

Standard Info Attribute Layout


0x00 8 File Creation Time

0x08
0x10 0x18 0x20 0x24 0x28 0x2C 0x30

8
8 8 4 4 4 4 4

File Alteration Time


MFT Change File Read Time DOS File Permissions Maximum number of versions Version number Class ID 2K Owner ID

MFT Attribute Example

Attribute is of type 00 00 00 01.


Standard Information

Attribute is 0x 00 00 00 60 bytes long. Attribute is resident (0x00) Contents are 0x 00 00 00 48 bytes long and start at offset 0x 00 18.

MFT Attribute Example


Second entry has attribute number 00 00 00 03 300000.
$FILE_NAME attribute

Total attribute length is 70 B. Contents start at offset 18B

THANK YOU

You might also like