You are on page 1of 18

Introduction to Data

Architecture
Lecture # 1

Dr. Saif Ur Rehman Malik


Data Architecture
• A data architecture describes how data is managed--from collection
through to transformation, distribution, and consumption. It sets the
blueprint for data and the way it flows through data storage systems. It
is foundational to data processing operations and artificial intelligence
(AI) applications.
Introduction (cont…)
• Corporate data include everything found in the corporation in the way of data.
• The most basic division of corporate data is by structured data and unstructured
data.
• As a rule, there are much more unstructured data than structured data.
• Unstructured data have two basic divisions—
• repetitive data and nonrepetitive data.
• Big data is made up of unstructured data.
Introduction (cont…)
• Nonrepetitive big data has a fundamentally different form than repetitive
unstructured big data.
• The differences between nonrepetitive big data and repetitive big data are so large
that they can be called the boundaries of the “great divide.”
• As a rule, nonrepetitive big data has MUCH greater business value than repetitive
big data.
Data Architecture
• Data architecture is about the larger picture of data and how it fits together in a typical organization.
Subdividing Data

Corporate Data
Structured Data
• Structured data is when data is in a standardized format, has a well-
defined structure, complies to a data model, follows a persistent
order, and is easily accessed by humans and programs. This data type
is generally stored in a database

• Examples: SQL, Excel, or any relational database.


Unstructured Data
Unstructured data is information that is not arranged according to a preset data model or schema, and therefore
cannot be stored in a traditional relational database or RDBMS. Text and multimedia are two common types of
unstructured content.
Repetitive Unstructured
• A typical form of repetitive unstructured data in the corporation might be the data generated by an analog
machine.

• For example, a farmer has a machine that reads the identification of railroad cars as the railroad cars pass
through the farmer's property. Trains pass through the property night and day. The electronic eye reads and
records the passage of each car on the track.
Nonrepetitive Unstructured Data
• Nonrepetitive unstructured data are data that are nonrepetitive, such as e-mails.
• Each email can be long or short. The e-mail can be in English or Spanish (or some other languages.) The
author of the e-mail can say anything that he/she pleases. It is only a pure accident if the contents of any e-
mail are identical to the contents of any other email.
• And there are many forms of nonrepetitive unstructured data. There are voice recordings, there are contracts,
there are customer feedback messages, etc.
The Great Divide of Data
The Great Divide of Data
It is hardly obvious why there should be this great divide of data. But
there are some very
• good reasons for the divide:
• Repetitive data usually have very limited business value, while
nonrepetitive data are rich in business value.
• Repetitive data can be handled one way; nonrepetitive data are
handled very differently.
• Repetitive data can be analyzed one way, while nonrepetitive data can
be analyzed in a very different manner.
Textual/Nontextual Data

• Nonrepetitive unstructured data can be divided into textual and nontextual data.
Business Value
The Data Infrastructure

You might also like