Week 01-03 B. Learning Material 1

MESSAGE FROM _____________
Good day. How are you? I hope you are feeling great at home. Welcome to the
second term of this new normal. In this session, you will understand the concept of
data warehousing.
At the end of this session, you are expected to be able to:
1. define the data warehousing
2. draw the three types of data architecture
Please be guided with the following:
€ First, read Information Sheet 1.1.1 – Data Warehousing
a. Aggregate Data and Transactional Data
b. Data Warehouse Architecture
€ Then, perform Course Activity 1.1.1
Note: Acquiring new Learning Material is possible after returning the previous one.
Any updates will be announced through TEXT or Facebook group page or Google
Classroom)
Write your answers on a separate clean sheet of paper. Any marks/dirt/loss or

damage Self Learning Materials will be charged.
Just in case you have questions regarding the lesson content, do not hesitate to send it
through our GROUP CHATBOX or thru text messaging at 09171355341.
Reference: https://www.oracle.com/ph/database/what-is-a-data-warehouse/, https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html,

https://booksite.elsevier.com/9780123743695/10steps_DataCategories.pdf, https://www.astera.com/type/blog/data-warehouse-architecture/
Northlink Technological College
Learning Materials on Media and Information Literacy
Developed by: Rusiel Mae A. Silos, LPT
Information Sheet 1.1.1
Data Warehousing
Data Warehousing
It is a repository that includes past and commutative information from one or multiple
sources. It streamlines reporting and BI processes of businesses. Instead of
processing transactions, a data warehouse works as a relational database and
performs querying and analysis.
A data warehouse typically includes historical transactional data. However, it can

contain data from other sources as well. It distinguishes analytical capacity from
transaction capacity and allows companies to amalgamate data from numerous
sources. This way, it assists in:
● Preserving past records

● Evaluating the data to better understand and enhance the corporate operations
Data warehouses are solely intended to perform queries and analysis and often
contain large amounts of historical data. The data within a data warehouse is usually
derived from a wide range of sources such as application log files and transaction
applications.
A data warehouse centralizes and consolidates large amounts of data from multiple
sources. Its analytical capabilities allow organizations to derive valuable business
insights from their data to improve decision-making.
The key characteristics of a data warehouse are as follows:
● Data is structured for simplicity of access and high-speed query performance.

● End users are time-sensitive and desire speed-of-thought response times.
● Large amounts of historical data are used.
● Queries often retrieve large amounts of data, perhaps many thousands of rows.
● The data load involves multiple sources and transformations.
In general, fast query performance with high data throughput is the key to a successful
data warehouse.

A. Aggregate Data and Transactional Data
Aggregate datarefers to numerical or non-numerical information that is (1) collected

from multiple sources and/or on multiple measures, variables, or individuals and (2)
compiled into data summaries or summary reports, typically for the purposes of public
reporting or statistical analysis—i.e., examining trends, making comparisons, or
revealing information and insights that would not be observable when data elements
are viewed in isolation. For example, information about whether individual students
graduated from high school can be aggregated—that is, compiled and
summarized—into a single graduation rate for a graduating class or school, and annual
school graduation rates can then be aggregated into graduation rates for districts,
states, and countries.
To further illustrate the concept of aggregate data and how it may be used in public
education, consider a school with an enrollment of 500 students, which means the
school maintains 500 student records, each of which contains a wide variety of
information about the enrolled students—for example, first and last name, home
address, date of birth, gender identification, race or ethnicity, date and period of
enrollment, courses taken and completed, course-grades earned, test scores, etc. (the
information collected and maintained on individual students is often
called student-level data , among other terms). Once or twice a year, the school
district may be required to submit student-enrollment reports to their state department
of education. Each school in the district will then compile a report that documents the
number of students currently enrolled in the school and in each grade level, which
requires administrators to summarize data from all their individual student records to
produce the enrollment reports. The district now has aggregate enrollment information
about the students attending its schools. Over the next five years, the school district
could use these annual reports to analyze increases or declines in district-wide
enrollment, enrollment at each school, or enrollment at each grade level. The district
could not, however, determine whether there have been increases or declines in the
enrollment of white and non-white students based on the aggregate data it received
from its schools. To produce a report showing distinct enrollment trends for different
races and ethnicities, the district schools would then need to disaggregate the
enrollment information by racial and ethnic subgroups.

Starting from its familiar grounds of aggregate data sets for routine data it has included
patient related data and then data in the areas of HR, finance, logistics and laboratory
management, moving towards operational or transactional data.
We can differentiate between transactional and aggregate data. A transactional

system(or operational system from a data warehouse perspective) is a system that
collects, stores and modifies detailed level data. This system is typically used on a
day-to-day basis for data entry and validation. The design is optimized for fast insert
and update performance.
Transactional data describe an internal or external event or transaction that takes place
as an organization conducts its business. Examples include sales orders, invoices,
purchase orders, shipping documents, passport applications, credit card payments,
and insurance claims. These data are typically grouped into transactional records,
which include associated master and reference data.
B. Data Warehouse Architecture
Data warehouse architecture defines the arrangement of data and the storing structure.
As the data must be organized and cleansed to be valuable, a data warehouse
architecture centers on identifying the most effective technique of extracting
information from raw data in the staging area and converting it into a simple
consumable structure using a dimensional model that delivers valuable business
intelligence.
When designing a company’s data warehouse, there are three main types of
architecture to take into consideration.
Single-tier architecture
A single-tier data warehouse architecture centers on

producing a dense set of data and reducing the volume of
data deposited. Although it is beneficial for eliminating
redundancies, this architecture is not suitable for businesses
with complex data requirements and numerous data streams.

Two-tier architecture
This architecture splits the tangible data sources from the

warehouse itself. Although it is more efficient at data storage and
organization, the two-tier architecture is not scalable. Moreover, it
only supports a nominal number of users.
Three-tier architecture
This is the most common type of data warehouse architecture

as it produces a well-organized data flow from raw information to
valuable insights.
The bottom tier typically comprises the databank server that

creates an abstraction layer on data from numerous sources,
like transactional databanks utilized for front-end uses.
The middle tier includes an Online Analytical Processing (OLAP)server. From a user’s
perspective, this level alters the data into an arrangement that is more suitable for
analysis and multifaceted probing. Since it includes an OLAP server pre-built in the
architecture, we can also call it the OLAP focused data warehouse.
Source:
DatawarehouseInfo

The third and the topmost tier is the client level which includes the tools and
Application Programming Interface (API) used for high-level data analysis, inquiring,
and reporting.
These are the different types of data warehouse architecture. Now let’s learn about the
elements of a data warehouse (DWH) architecture and how they help build and scale a
data warehouse in detail.
Main Components of Data Warehouse Architecture
Now that we have discussed the three data warehouse architectures, let’s look at the
main constituents of a data warehouse. 1. Data Warehouse Database
The central component of a data warehousing architecture is a databank that stocks all
enterprise data and makes it manageable for reporting. Obviously, this means you
need to choose which kind of database you’ll use to store data in your warehouse.
The following are the four database types that you can use:
● Typical relational databaseswhich are the row-centered databases you

perhaps use on an everyday basis. For example, Microsoft SQL Server, SAP,
Oracle, and IBM DB2.
● Analytics databaseswhich are precisely developed for data storage to sustain
and manage analytics. For example, Teradata and Greenplum.
● Data warehouse applicationswhich aren’t exactly a kind of storage databases,
but several dealers now offer applications that offer software for data
management as well as hardware for storing data. For example, SAP Hana,
Oracle Exadata, and IBM Netezza.
● Cloud-based databaseswhich can be hosted and retrieved on the cloud so that
you don’t have to procure any hardware to set up your data warehouse. For
example, Amazon Redshift, Microsoft Azure SQL, and Google BigQuery.

2. Extraction, Transformation, and Loading (ETL) Tools
Source: https://panoply.io/uploads/etl-1.png
ETL (Extract, Transform, Load) is an automated process which takes raw data,
extracts the information required for analysis, transforms it into a format that can serve
business needs, and loads it to a data warehouse. It summarizes data to reduce its
size and improve performance for specific types of analysis.
ETL tools are central to data warehouse architecture. These tools help with extracting
data from different sources, transforming it into a suitable arrangement, and loading it
into a data warehouse.
The ETL tool you choose will determine:
● The time expended in data extraction

● Approaches to extracting data
● Kind of transformations applied and the simplicity to do so
● Business rule definition for d
ata validation and cleansingto improve end-product
analytics
● Filling mislaid data
● Outlining information distribution from the fundamental depository to your BI
applications
The benefits of using an ETL Tool
It helps organizations manage their data in several ways. In particular, they excel at
providing the following benefits.
● Scalability – Good ETL tools can scale up and down to accommodate the needs
of business users. In some instances, those needs center on huge batch jobs of
big datasets. In others, it could be smaller datasets for exploration.

● Real-time – ETL tools are excellent for real-time operations with data.
Competitive tools enable users to specify the rate at which jobs are performed,
which can be every couple of seconds, every five minutes, or any other time
frame to handle low-latent ETL needs.
● Automation – Although some of the automation benefits of ETL tools pertain to
their real-time capabilities, they also apply to less frequent tasks like nightly batch
jobs. With these tools, the ETL process needs to be set up once and then
organization can reuse it at will.
● Governance – Credible ETL tools have governance feature that are highly
important for ensuring data integrity and accuracy. Some of the more important
capabilities include data lineage for regulatory compliance (even down to the
transformation level), metadata management, and lifecycle management.
In the Extract Load Transform (ELT) process, you first extract the data, and then you
immediately move it into a centralized data repository. After that, data is transformed
as needed for downstream use. This method gets data in front of analysts much faster
than ETL while simultaneously simplifying the architecture.
Course Activity 1.1.1
Instructions: Read and write your answers in a one-whole sheet of paper.

1. What is the difference between data warehouse and database? (10 pts)
2. Think of a business and draw its data warehouse architecture.
a. single-tier architecture (5 pts)
b. two-tier architecture (5 pts)
c. three-tier architecture (10 pts)


Week 01-03 B. Learning Material 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 01-03 B. Learning Material 1

Uploaded by

Copyright:

Available Formats

MESSAGE FROM _____________

Write your answers on a separate clean sheet of paper. Any marks/dirt/loss or

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,

A data warehouse typically includes historical transactional data. However, it can

● Preserving past records

The key characteristics of a data warehouse are as follows:

● Data is structured for simplicity of access and high-speed query performance.

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,

Aggregate data​refers to numerical or non-numerical information that is (1) collected

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,

We can differentiate between transactional and aggregate data. A ​ transactional

B. Data Warehouse Architecture

A single-tier data warehouse architecture centers on

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,

This architecture splits the tangible data sources from the

This is the most common type of data warehouse architecture

The bottom tier typically comprises the databank server that

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,

Main Components of Data Warehouse Architecture

● Typical relational databases​which are the row-centered databases you

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,

The ETL tool you choose will determine:

● The time expended in data extraction

The benefits of using an ETL Tool

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,

Course Activity 1.1.1

Instructions: ​ Read and write your answers in a one-whole sheet of paper.

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,

You might also like

Reference: https://www.oracle.com/ph/database/what-is-a-data-warehouse/, https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html,

Reference: https://www.oracle.com/ph/database/what-is-a-data-warehouse/, https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html,

Aggregate datarefers to numerical or non-numerical information that is (1) collected

Reference: https://www.oracle.com/ph/database/what-is-a-data-warehouse/, https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html,

We can differentiate between transactional and aggregate data. A transactional

Reference: https://www.oracle.com/ph/database/what-is-a-data-warehouse/, https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html,

Reference: https://www.oracle.com/ph/database/what-is-a-data-warehouse/, https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html,

● Typical relational databaseswhich are the row-centered databases you

Reference: https://www.oracle.com/ph/database/what-is-a-data-warehouse/, https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html,

Reference: https://www.oracle.com/ph/database/what-is-a-data-warehouse/, https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html,

Instructions: Read and write your answers in a one-whole sheet of paper.

Reference: https://www.oracle.com/ph/database/what-is-a-data-warehouse/, https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html,