Professional Documents
Culture Documents
Architecture
Lecture # 1
Corporate Data
Structured Data
• Structured data is when data is in a standardized format, has a well-
defined structure, complies to a data model, follows a persistent
order, and is easily accessed by humans and programs. This data type
is generally stored in a database
• For example, a farmer has a machine that reads the identification of railroad cars as the railroad cars pass
through the farmer's property. Trains pass through the property night and day. The electronic eye reads and
records the passage of each car on the track.
Nonrepetitive Unstructured Data
• Nonrepetitive unstructured data are data that are nonrepetitive, such as e-mails.
• Each email can be long or short. The e-mail can be in English or Spanish (or some other languages.) The
author of the e-mail can say anything that he/she pleases. It is only a pure accident if the contents of any e-
mail are identical to the contents of any other email.
• And there are many forms of nonrepetitive unstructured data. There are voice recordings, there are contracts,
there are customer feedback messages, etc.
The Great Divide of Data
The Great Divide of Data
It is hardly obvious why there should be this great divide of data. But
there are some very
• good reasons for the divide:
• Repetitive data usually have very limited business value, while
nonrepetitive data are rich in business value.
• Repetitive data can be handled one way; nonrepetitive data are
handled very differently.
• Repetitive data can be analyzed one way, while nonrepetitive data can
be analyzed in a very different manner.
Textual/Nontextual Data
• Nonrepetitive unstructured data can be divided into textual and nontextual data.
Business Value
The Data Infrastructure