You are on page 1of 5

CHAPTER 8.

Data Architecture: A High-Level Perspective


One of the aspects of architecture is to provide a high-level perspective. For a
high-level perspective, data architecture looks like the diagram seen in
Fig. 8.4.1.

A HIGH LEVEL PERSPECTIVE


Fig. 8.4.1 shows representative components. For example, on the left-hand side
where there are cathode ray tubes (CRTs) emanating from an application, the
diagram is representative of online transaction processing systems. In reality,
there are MANY applications and MANY databases represented by the applica-
tion, database, and CRTs.
The diagram shows that there are two major types of big data—repetitive data
and nonrepetitive data. And of the repetitive data, there is simple repetitive data
and context-enriched repetitive data.
The typical sources of the different types of big data are shown as well.
The diagram shows that repetitive data are distilled into data that can be placed
into the analytic data warehouse environment. In addition, nonrepetitive data
can be disambiguated and placed either in the data warehouse or back into big
data as context-enriched repetitive big data.

REDUNDANCY
There are many issues raised by the diagram. One of the issues is that of redun-
dant data. One looks at the diagram, and it appears that there is redundant data
everywhere.
In fact, there is data that have been transformed. And if a value of data remains
the same after transformation, then you may want to consider the data to be
redundant. Then again, you may not.
Consider redundancy in the real world. Take the time of day. You can find the
time of day on the Internet, on the telephone, on the radio, on television, and 225

Data Architecture. https://doi.org/10.1016/B978-0-12-816916-2.00029-2


© 2019 Elsevier Inc. All rights reserved.
226 C HA PT E R 8 . 4 : Data Architecture: A High-Level Perspective

FIG. 8.4.1
A high level architecture.

many other places, for that matter. Does the fact that time of day appears redun-
dantly in many places becomes a bother? The only time it becomes a bother is if
there is no way to determine what the accurate time is. If there were no defin-
itive source of time, then having time appear redundantly would be a problem.
But as long as there is some definitive source somewhere and as long as most
redundant sources adhere to that definitive source, then there is no problem. In
fact, having redundant sources of time is actually quite helpful, as long as there
is no problem with the integrity of that time.
Therefore, having redundant data across the enterprise as seen in Fig. 8.4.1 is
not an issue as long as the integrity of the data is not an issue.

THE SYSTEM OF RECORD


The integrity of the data in data architecture is established by what can be called the
“system of record.” The system of record is the one place where the value of data is
definitively established. Note that the system of record applies only to detailed
granular data. The system of record does not apply to summarized or derived data.
In order to understand the system of record, think of a bank and your bank account
balance. For every account in every bank, there is a single system of record for
account balance. There is one and only one place where the account balance is
The System of Record 227

FIG. 8.4.2
The system of record.

established and managed. Your bank account balance may appear in many places
throughout the bank. But there is only one place where the system of record is kept.
The system of record moves throughout the data architecture that has been
described.
Fig. 8.4.2 depicts the movement of the system of record.
Fig. 8.4.2 shows that as data are captured, especially in the online environment,
the data have its first occurrence of the system of record. Location 1 shows that
the system of record for current valued data is found in the online environment.
You can think of calling the bank and asking for your account balance that
exists right now, and the bank looks into its online transaction processing envi-
ronment to find your account balance right now.
Then one day, you have an issue with a bank transaction that occurred 2 years ago.
Your lawyer requires you to go back and prove that you made a payment 2 years
ago. You can’t go to your online transaction processing environment. Instead, you
go to your record in the data warehouse. As data age, the system of record moves
for older data to the data warehouse. That is location 2 in the diagram.
228 C HA PT E R 8 . 4 : Data Architecture: A High-Level Perspective

Time passes and you get audited by the IRS. This time, you have to go back
10 years time to prove what financial activity you have had a decade ago.
Now, you go to the archival store in big data. That is location 3 in the diagram.
So, as time passes, the system of record for data changes in data architecture.

DIFFERENT TYPES OF QUESTIONS


Another way to look at the data found in data architecture is in terms of what
types of questions are answered in different parts of the architecture.
Fig. 8.4.3 shows that different types of questions are answered in different parts
of the architecture.
Fig. 8.4.3 shows that in location 1, details up to the second questions are
answered. Here is where you ask up to the second accurate account balance
information. Location 2 indicates that in the data warehouse, you look at your
historical activity that has been passed through your bank account.
Location 3 is the ODS. In the ODS, you find up to the second accurate
integrated information. In the ODS, you look across information such as
ALL your account information—your loans, your savings accounts, your
checking account, your IRA, and so forth.

FIG. 8.4.3
Answering different questions throughout the architecture.
Different Communities 229

In location 4, there are the data marts. In the data marts is where bank
management combines your account information with thousands of other
accounts and looks at the information from the perspective of a department.
One department looks at the data in the data marts from an accounting
perspective. Another department looks at the data from the perspective of
marketing and so forth.
There is yet another perspective of data afforded by the data found in location 5.
In location 5, big data is found. There is deep history there and a variety of other
data. The kinds of analysis that can be done in location 5 are miscellaneous and
diverse.
Of course, the differences in data and the types of analysis that can be done are
different for different industries. The example that has been used is of a bank for
the purposes of making the example clear. But for other industries, there are
other types of usage information.

DIFFERENT COMMUNITIES
Different communities use the information found in data architecture. In gen-
eral, the clerical community uses information found in locations 1 and 2.
Everyone uses the data found in location 3. The data warehouse serves as a cross
roads for information throughout the organization. Different functional
departments use the information found in location 4. And location 5 serves
as an omnibus for the entire organization.

You might also like