You are on page 1of 13

DW2.

0 By W H Inmon

DW2.0 is the next generation of data warehousing. Data warehousing began in the mid 1980s. Since then there have been many advances in architecture, technology and information systems. Today those advances have been woven into the next generation of data warehousing. The first generation of data warehousing featured transaction data that was integrated and placed on disk storage. There were other features of early first generation data warehousing, such as the advent of ETL. But there were also many missing features and functions that at the time were not recognized as belonging in a data warehouse. DW2.0 the next generation of data warehousing has many integrated features that were never found in the first generation. DW2.0 in addition to integrated transaction data includes qualified and edited unstructured data in several forms, integrated metadata, including both business metadata and technical metadata, online high performance data that can be updated, reference master data, profile data records.

In addition DW2.0 contains continuous time span data.

DW2.0 recognizes the life span the life cycle - of data as data is gathered, used, and then discarded. As data enters the system data is fresh and new. Then data starts to age. First data grows into mid life, then it passes into older life, then finally it becomes archived. All of these technologies and sophistications are tightly knit together into the DW2.0 data warehouse framework. Fig DW2.0.1 shows DW2.0.

THE LIFE CYCLE OF DATA

INMON DATA SYSTEMS, 2006, ALL RIGHTS RESERVED

DW2.0 By W H Inmon

Transaction data

Interactive
Very current

A p p l

A p p l

A p p l

Textual subjects
Internal, external

Reference, master data

Detailed
S u b j S u b j S u b j S u b j

Integrated
Current++

Captured text

Simple textual pointers

Continuous snapshot data Profile data


S S S j j b b u u u b j

B u s i n e s s T e c h n i c a l B u s i n e s s T e c h n i c a l B u s i n e s s T e c h n i c a l

Text id ......
Linkage

Text to subj

Summary

Textual subjects
Internal, external

Reference, master data

Detailed
S u b j S u b j S u b j S u b j

Near line
Less than current

Captured text

Simple textual pointers

Continuous snapshot data Profile data


S S S j j b b u u u b j

Text id ......
Linkage

Text to subj

Summary

Textual subjects
Internal, external

Reference, master data

Detailed
S u b j S u b j S u b j S u b j

Continuous snapshot data Profile data


S S S j j b b u u u b j

Archival
Older

Captured text

Simple textual pointers

Text id ......
Linkage

Text to subj

Summary

DW 2.0 Fig dw2.0.1

In looking at the diagram shown in Fig DW2.0.1 it is seen that there are four major sectors of DW2.0. Those sectors are the interactive sector, the integrated sector, the near line sector, and the archival sector.

INMON DATA SYSTEMS, 2006, ALL RIGHTS RESERVED

DW2.0 By W H Inmon
Data arrives in DW2.0 in the form of applications. As a rule these applications execute transactions. The transactions require fast response time and high availability. There normally is very little integration of data among applications. As transaction data passes to the integrated layer, the data is integrated. Of course data can pass into the integrated sector without having passed through the application sector. There is a wide variety of data found in the integrated sector. The application data that came from the interactive sector is integrated and is reformed into detailed subject areas. Other data that is found in the integrated sector includes textual subjects, captured text text linkages continuous snapshots of data profile data.

In addition at the integrated level there is local business metadata, local technical metadata, and enterprise metadata, usually placed in an enterprise metadata repository. The same forms of data are found in the near line sector and in the archival sector. Data in the interactive sector is very current, up to possibly a month old. Data in the integrated sector is from a day old up to two or three years old. Data in the near line sector is from six months old to ten years old. And data in the archival sector is from five years old to infinity. The major factor that determines the location of data is the probability of access and the need for the speed of access. The interactive sector contains data that is very fast to access and has a high probability of access. The integrated sector contains data that that must be accessed in a reasonably quick speed and that has a moderately high probability of access. The data found in the near line sector has a modest probability of access and a need for fairly relaxed performance. The data found in the archival environment has a very low probability of access and not much need for speed. In fact it is common to keep some data in archives where it is perceived that the probability of access is zero. Corporations keep data for statutory purposes that has almost no hope of ever being accessed. In other cases corporations keep data just to make sure that if they ever have to access the data that it is where it can be found. Even though the probability of access is zero or close to zero, if the data ever has to be retrieved it can be restored with a minimum of investment.

VOLUMES OF DATA

When looking at the bulk of data, the interactive environment contains almost a pittance of data. The integrated sector contains more data. The near line sector contains a lot of data, and the archival environment contains a whole lot of data. From a volume of data perspective then, there are big differences between the different sectors.

INMON DATA SYSTEMS, 2006, ALL RIGHTS RESERVED

DW2.0 By W H Inmon
Each of the different sectors of DW2.0 require technology that is optimal for the sectors. There is no one technology that is a one size fits all. In general data passes whole cloth from one sector to another. The two exceptions to this are when application data passes to the integrated sector and must be integrated. The second exception is the case where data passes to the archival environment. For a variety of reasons data may be considerably transformed as it passes to the archival sector. Some of those reasons are - to remove the data from the structure and technology of data that may not be supported twenty years from now, - to restructure the data for the purpose of faster and more flexible access in the archival environment, and so forth.

METADATA IN DW2.0

Local metadata exists in the many technologies that are found in the sector. Local data is pasteurized and is sent to the enterprise metadata repository. The enterprise metadata repository holds a snapshot of the metadata found in the local pool of metadata. If changes need to be made to the metadata, they are first made at the local level, then the metadata is shipped to the enterprise level. Metadata can be stored over time at the enterprise level, if a track of the changes to metadata over time is desired.

THE DIFFERENT COMPONENTS

DW2.0 contains many different components. The following is a description of the most interesting and the most common of the components found in DW2.0.

CAPTURED TEXT

Fig DW2.0.2 shows that captured text is part of DW2.0.


Captured text

Text id ......

Text id Text id Text id Text id

The dogs of war will be unleashed at midnight when the bells toll twelve times.

The dogs of war will be unleashed at midnight when the bells toll twelve times.

The dogs of war will be unleashed at midnight when the bells toll twelve times.

The dogs of war will be unleashed at midnight when the bells toll twelve times.

These are a series of qualified, edited strings of text that are about communications and other forms of textual data that are important to the corporation. Fig dw 2.0.2
Captured text comes from the unstructured environment. Captured text may exist in the form of emails, documents, transcripts of telephone conversations, or other textual information. As a rule captured text is in the same unedited state in which it exists in the unstructured environment. However the unstructured text has been selected for relevancy to the business environment. It would make no sense to put massive amounts of unstructured text in the DW2.0 environment unless the unstructured text is important to the business represented by DW2.0. Therefore the

INMON DATA SYSTEMS, 2006, ALL RIGHTS RESERVED

DW2.0 By W H Inmon
unstructured text that finds its way into DW2.0 has previously been edited and cleared for passage into the DW2.0 warehouse.

PROFILE DATA

Profile data is shown in Fig DW2.0.3.

Profile data

S u b j

S u b j

S u b j

Profile data is data that has been aggregated or amalgamated from other sources. Other sources may contribute one unit of data or multiple units of data. Other sources may be summarized before entering the profile record. Typical profile records are for customers. Fig dw 2.0.3
Profile data is composite data that has been collected from a variety of sources. Profile data represents a thumbnail sketch of lots of other data. A typical type of profile data is a composite customer record. A composite customer record can be made up of data coming from a wide variety of sources. Customer purchases, customer payments, customer browsing on the Internet, customer demographics all can be placed in a profile record for the customer. Once the profile record is created it can be accessed quickly and cleanly. There is no need to go and find and analyze all the source data when it comes time to look at a customer. While customer data is a typical subject for profiling, there are many other subjects which can be profiled.

DETAILED SUBJECT AREA DATA


Detailed
S u b j S u b j S u b j S u b j

Fig DW2.0.4 shows detailed subject area data.

This data is detailed data that has been integrated. This data is organized by major subject area. Much of this data comes from transactions. Fig dw 2.0.4
INMON DATA SYSTEMS, 2006, ALL RIGHTS RESERVED

DW2.0 By W H Inmon

Detailed subject area data is at the heart of the data warehouse. Detailed subject area data is data that comes from applications and has been integrated. Once the detailed subject area data is gathered into DW2.0, the data then becomes the basis for doing business intelligence. The detailed subject area data is so granular that it can be shaped and reshaped in many different ways. The detailed subject area data supports finances, accounting, sales, marketing, engineering, human resources, and so forth. The detailed subject area data is normally found in a relational format. Each record of the detailed subject area data is time stamped.

LINKAGE TEXT TO SUBJECT

Fig DW2.0.5 shows the linkage to text data found in the DW2.0 environment.
Linkage
Text id Text id Text id Text id Key Key Key Key

Text to subj

This is primarily a two way linkage between the identifier of qualified text and the key of transaction related data. Fig dw 2.0.5
Fig DW2.0.5 shows that there is linkage data. When unstructured data is brought over to the data warehouse environment even when it has been edited and screened the textual data still can be more useful if it is linked to classical transaction and structured data found in DW2.0. Typical links can be formed across email address and telephone numbers. Still other links can be formed across names and mutations of names. This data is normally created after the textual data has been brought across to the data warehouse environment. Note that some textual data will have no linkage but will be relevant to the business of the corporation.

CONTINUOUS SNAPSHOT DATA

Fig DW2.0.6 shows continuous snapshot data.

INMON DATA SYSTEMS, 2006, ALL RIGHTS RESERVED

DW2.0 By W H Inmon

Continuous snapshot data

From ----To

From ----To

From ----To

This is a series of related records of data where one record is logically related to another record. There is no logical overlap but there is the possibility of gaps of discontinuity. In theory the time span defined is from the beginning of time to infinity. The elements of data that are contained in this structure are usually few and slow to change. Fig dw 2.0.6
Continuous snapshot data is data that is linked together by a series of from date and to dates. The linkage is logical, not physical. There can be no over lap of continuous snapshot data, but there can be gaps of discontinuity. Continuous snapshot data is useful when there are few variables that are slow to change. There is a continuous definition of data that can be created. Customer name and address is a typical example where continuous snapshot structuring of data applies.

APPLICATION DATA

Application data is shown by Fig DW2.0.7.


A p p l A p p l A p p l

Application data is usually unintegrated and is subject to very fast response time and high availability. Application data is usually (but not necessarily) related to transactions. Fig dw 2.0.7
Application data is transacted in a 2/3 second response time environment. Application data is notoriously unintegrated. Application data allows for update of data values, as well as insertion and creation. Application data is where most data is generated in the corporation, as a byproduct of the execution of transactions.

TEXTUAL SUBJECTS

Textual subjects are depicted by Fig DW2.0.8.

INMON DATA SYSTEMS, 2006, ALL RIGHTS RESERVED

DW2.0 By W H Inmon

Textual subjects
Internal, external

Sarbanes Oxley promise to deliver contingency sales delayed shipment accounts payable delayed billing future feature ..........................

Human resources race background salary education address previous position age place of birth ............................

Security encryption mask virus firewall administrator encrypted at rest encrypted during transmission ................................................

Textual subjects are the different categories of information around which text data is related. The categories can be either internal or external, or both depending on the usage of the text. Fig dw 2.0.8
Textual subjects are those subjects by which text from the unstructured environment is organized. Textual subjects can be internally generated or externally generated by the creation of one or more ontologies.

REFERENCE/MASTER DATA

Fig DW2.0.9 shows the different reference data that a corporation has that is captured in the DW2.0 environment.
Reference, master data

This data is the reference data that glues the corporation together. Fig dw 2.0.9
Fig DW2.0.9 shows the reference data that every corporation has. There are many forms of reference tables. When reference data applies across the entire enterprise it can be called master data.

SUMMARY DATA

Fig DW2.0.10 shows that there is a place for summary in DW2.0.

Summary

Some data arrives at the data warehouse at too low a level of granularity, such as click stream data. This data needs to be passed through a granularity manager. Fig dw 2.0.10
While most data is detailed in DW2.0, there is a place for summary data. Summary data is appropriate when the summary is widely used across the enterprise. When summary data is created, it is also wise to include the rules for summarization

INMON DATA SYSTEMS, 2006, ALL RIGHTS RESERVED

DW2.0 By W H Inmon
what data was included, what data was excluded, how the calculation was made, and so forth.

SIMPLE UNSTRUCTURED POINTERS


Simple textual pointers

Fig DW2.0.11 shows simple unstructured pointers in the DW2.0 environment.

Some qualified textual data is too bulky and is of such a low priority of interest that it doesnt belong in the DW 2.0 environment. This unstructured data still needs to be referenced. Fig dw 2.0 11
Fig DW2.0.11 shows that on occasion that pointers from the DW2.0 environment to the unstructured environment are useful. On occasion the bulk of data found in the unstructured environment is too great to be carried in DW2.0. And yet there still may be valuable information found in the unstructured environment. On these occasions it makes sense to carry simple pointers to the unstructured environment so that unstructured data can be accessed if necessary, however indirectly.

METADATA IN DW2.0

There is metadata in DW2.0. Metadata is one of the most important components because it is metadata that acts as the nerve system of DW2.0. Metadata describes what is in DW2.0 and how data and components in DW2.0 are related. Without metadata, the data in DW2.0 would be a big pile of essentially useless data. There are different kinds of metadata within DW2.0. Those kinds of metadata are local business metadata, local technical metadata, and enterprise metadata.

LOCAL BUSINESS METADATA

Local metadata refers to metadata found at a given component. Some examples of local metadata might be metadata found in a business intelligence tool, metadata found in a dbms directory, metadata found in a spreadsheet, metadata found in a report, metadata found in a screen, metadata found in a data dictionary, and so forth. In each of these cases there is a technology or even just a single instance of a technology where metadata is contained within that technology. The metadata is wholly contained and managed within that technology. If metadata needs to be added or changed, it is added or changed within that technology. In other words local metadata lives an existence that is entirely self contained. The problem with local metadata is that it is part of a larger world of which it is unaware. Any given unit of local metadata is unaware that there is lots of other metadata to which it needs to relate. That is why it is called local. Within the category of local metadata there is business metadata and technical metadata. Business metadata is metadata that contains text that is meaningful to and useful to the business person. Technical metadata is metadata that contains text that is useful to and of interest to the technician.

INMON DATA SYSTEMS, 2006, ALL RIGHTS RESERVED

DW2.0 By W H Inmon
Fig DW2.0.12 depicts local metadata.

Local metadata is metadata that resides at the local level. The local metadata is found in different places throughout the environment. The local metadata is in many different forms and technologies. Fig dw 2.0.12
Fig DW2.0.13 shows business metadata.
B u s i n e s s

Business metadata is metadata that is useful to the business person and is in the language of the business person. Fig dw 2.0 13
Fig DW2.0.14 shows technical metadata.

INMON DATA SYSTEMS, 2006, ALL RIGHTS RESERVED

DW2.0 By W H Inmon

T e c h n i c a l

Technical metadata is metadata that is useful to the technician and is in the language of the technician. Fig dw 2.0.14
There then is local metadata throughout every technology found in DW2.0. But there is a need for integrating local metadata. That need is met through the establishment of an enterprise wide metadata repository. An enterprise metadata repository is technology that gathers all the local metadata and assimilates it into one place. Once in a single place, the enterprise metadata repository allows the local metadata to be integrated. w 2.0.15 shows an enterprise metadata repository.

INMON DATA SYSTEMS, 2006, ALL RIGHTS RESERVED

DW2.0 By W H Inmon

The place where metadata from all over the enterprise is placed. Note that this location is not where data is created or updated. Instead data is sent to the enterprise metadata repository from different local sources. Fig dw 2.0.15
The local and the enterprise wide metadata reflect a larger structure of metadata. Fig 2.0.16 shows the structure of the metadata.

INMON DATA SYSTEMS, 2006, ALL RIGHTS RESERVED

DW2.0 By W H Inmon

Enterprise Metadata Repository


Local Metadata Local Metadata Business metadata Business metadata Technical Metadata
Reports
n me a Ja e n Tom Jan ad r d zp i ph ne o sex e hn t c i s a ar y l n at l

Technical Metadata BI universes Dbms

Documents
Ge rr y Jo hn Gu y do lun chen day I ex pe ct th e packa ge . - lets . Wh to ca n The co ntr ac ts just a rr ive d la st night. I g ot the m on th e f ro nt door step Cla ud ia is ge ttin g to b e pr ett y u pset. wh en we ca m in fro m the p ar ty. e June Te rr y And re w Do yo u t hin k yo u cou ld s en du thin ke n egmight be aIrr iving ? rk with th em a s we ll? Wh en yo u me th yo u ativ es so co uld wo Wher e a re my d ra wing s t ha tdo yopr omised me? Tha nks , Ly nn I t ho ug ht you sa id they wo uld b e h er e b y t od ay. Bill Bill

Reports
na me Ja e n To m Ja n Mar y Je r y r Mac k Su e Bi l Ju ne Te r ry C hr s i Ju y d Ke n v i Je nne a C ar ol a r dd zp i ph n o e sex et hn c i sal ar y n at l

Documents
Ge rr y Jo hn Gu y do lun chen da y I ex pect th e pa cka ge . - lets . Wh to a n The co ntr ac ts ju stcarr ived la st nigh t. I g ot the m on th e f ro nt do or ste p Cla in fro ge ttin g to b wh en we ca meud ia ism th e par ty. e pr ett y upse t. And rew Ju ne Terr y Do yo u t hin k yo uWh en s en d me thk yoeg ativ es so I co uld wo rk with th em a s well? cou ld e n u mig Wher e a re my d ra wing s t ha tdo u pr omised me? ht be a rr iving ? yo yo u thin Tha nks , Ly nn I t ho ug ht you sa id they wo uld b e h er e b y t od ay. Bill Bill

Layouts

BI universes Dbms Business metadata

Local Metadata

Mar y Je r y r Mac k Sue Bi l Ju ne Te r ry C hr s i Ju y d Ke n v i Je n a ne C ar o l

Layouts

Spreadsheet

Email

Spreadsheet

Email

Technical Metadata BI universes Dbms

Reports
na me Jan e T om Jan Ma y r Jer r y Ma k c Sue Bi l Jun e T er r y Chs i r Jud y Kev n i Jea n ne Cao r l a r dd zi p p ne ho se x et h c ni sal a y r n at l

Documents
Ge rr y John Gu y do lun chen da y I ex pe ct th e pa cka ge . - lets . Wh to a n The co ntr ac ts ju stcarr ive d la st nigh t. I got the m on th e f ro nt do or ste p Cla in from th e p to b wh en we ca meud ia is ge ttingar ty. e pr ett y u pse t. e Ju ne cou ld And re w -yo u t hink youT rr y s end me th e n eg ativ es so I co uld wo rk with th em a s we ll? Do Wh en yo yo u thin k yo u mig Wh erThanks , d ra wing s t ha tdo u pr omised me? ht be a rr iving? e a re my Ly nn I t ho ug ht you sa id the y wo uld b e h er e b y t od ay. Bill Bill

Layouts

Spreadsheet

Email

Fig dw 2.0.18
Local metadata both business and technical are gathered locally or exist locally. Then the local metadata is gathered into an enterprise metadata repository. Once in the enterprise metadata repository the local data is edited and is organized according to the needs at the enterprise level.

About Inmon Data Systems Founded in Colorado by Bill Inmon, Guy Hildebrand and Dan Meers, Inmon Data Systems (IDS) is a software company dedicated to the proposition that there needs to be a bridge between the worlds of structured data and unstructured data. IDS has foundation technology that allows unstructured data to be brought into the structured environment and once there, integrated into the structured environment. Applications unstructured visualization (with Compudigm) - metadata consolidation - CRM enhancement - Communication compliance IDS is located in Castle Rock, Colorado. Contact Carol Renne at 303-973-3788 for further information.

INMON DATA SYSTEMS, 2006, ALL RIGHTS RESERVED