You are on page 1of 8

MESSAGE FROM _____________

Good day. How are you? I hope you are feeling great at home. Welcome to the
second term of this new normal. In this session, you will understand the concept of
data warehousing.
At the end of this session, you are expected to be able to:
1. define the data warehousing
2. draw the three types of data architecture
Please be guided with the following:
€ First, read Information Sheet 1.1.1 – Data Warehousing
a. Aggregate Data and Transactional Data
b. Data Warehouse Architecture
€ Then, perform Course Activity 1.1.1

Note: Acquiring new Learning Material is possible after returning the previous one.
Any updates will be announced through TEXT or Facebook group page or Google
Classroom)

Write your answers on a separate clean sheet of paper. Any marks/dirt/loss or


damage Self Learning Materials will be charged.

Just in case you have questions regarding the lesson content, do not hesitate to send it
through our GROUP CHATBOX or thru text messaging at 09171355341.

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,


https://booksite.elsevier.com/9780123743695/10steps_DataCategories.pdf​, ​https://www.astera.com/type/blog/data-warehouse-architecture/
Northlink Technological College
Learning Materials on Media and Information Literacy
Developed by: ​Rusiel Mae A. Silos, LPT
Information Sheet 1.1.1
Data Warehousing

Data Warehousing

It is a repository that includes past and commutative information from one or multiple
sources. It streamlines reporting and BI processes of businesses. Instead of
processing transactions, a data warehouse works as a relational database and
performs querying and analysis.

A data warehouse typically includes historical transactional data. However, it can


contain data from other sources as well. It distinguishes analytical capacity from
transaction capacity and allows companies to amalgamate data from numerous
sources. This way, it assists in:

● Preserving past records


● Evaluating the data to better understand and enhance the corporate operations

Data warehouses are solely intended to perform queries and analysis and often
contain large amounts of historical data. The data within a data warehouse is usually
derived from a wide range of sources such as application log files and transaction
applications.

A data warehouse centralizes and consolidates large amounts of data from multiple
sources. Its analytical capabilities allow organizations to derive valuable business
insights from their data to improve decision-making.

The key characteristics of a data warehouse are as follows:

● Data is structured for simplicity of access and high-speed query performance.


● End users are time-sensitive and desire speed-of-thought response times.
● Large amounts of historical data are used.
● Queries often retrieve large amounts of data, perhaps many thousands of rows.
● The data load involves multiple sources and transformations.

In general, fast query performance with high data throughput is the key to a successful
data warehouse.

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,


https://booksite.elsevier.com/9780123743695/10steps_DataCategories.pdf​, ​https://www.astera.com/type/blog/data-warehouse-architecture/
Northlink Technological College
Learning Materials on Media and Information Literacy
Developed by: ​Rusiel Mae A. Silos, LPT
A. Aggregate Data and Transactional Data

Aggregate data​refers to numerical or non-numerical information that is (1) collected


from multiple sources and/or on multiple measures, variables, or individuals and (2)
compiled into data summaries or summary reports, typically for the purposes of public
reporting or statistical analysis—i.e., examining trends, making comparisons, or
revealing information and insights that would not be observable when data elements
are viewed in isolation. For example, information about whether individual students
graduated from high school can be ​ aggregated​—that is, compiled and
summarized—into a single graduation rate for a graduating class or school, and annual
school graduation rates can then be aggregated into graduation rates for districts,
states, and countries.

To further illustrate the concept of aggregate data and how it may be used in public
education, consider a school with an enrollment of 500 students, which means the
school maintains 500 student records, each of which contains a wide variety of
information about the enrolled students—for example, first and last name, home
address, date of birth, gender identification, race or ethnicity, date and period of
enrollment, courses taken and completed, course-grades earned, test scores, etc. (the
information collected and maintained on individual students is often
called ​student-level data​ , among other terms). Once or twice a year, the school
district may be required to submit student-enrollment reports to their state department
of education. Each school in the district will then compile a report that documents the
number of students currently enrolled in the school and in each grade level, which
requires administrators to summarize data from all their individual student records to
produce the enrollment reports. The district now has ​aggregate​ enrollment information
about the students attending its schools. Over the next five years, the school district
could use these annual reports to analyze increases or declines in district-wide
enrollment, enrollment at each school, or enrollment at each grade level. The district
could not, however, determine whether there have been increases or declines in the
enrollment of white and non-white students based on the aggregate data it received
from its schools. To produce a report showing distinct enrollment trends for different
races and ethnicities, the district schools would then need to ​ disaggregate​ the
enrollment information by racial and ethnic subgroups.

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,


https://booksite.elsevier.com/9780123743695/10steps_DataCategories.pdf​, ​https://www.astera.com/type/blog/data-warehouse-architecture/
Northlink Technological College
Learning Materials on Media and Information Literacy
Developed by: ​Rusiel Mae A. Silos, LPT
Starting from its familiar grounds of aggregate data sets for routine data it has included
patient related data and then data in the areas of HR, finance, logistics and laboratory
management, moving towards operational or transactional data.

We can differentiate between transactional and aggregate data. A ​ transactional


system​(or operational system from a data warehouse perspective) is a system that
collects, stores and modifies detailed level data. This system is typically used on a
day-to-day basis for data entry and validation. The design is optimized for fast insert
and update performance.

Transactional data describe an internal or external event or transaction that takes place
as an organization conducts its business. Examples include sales orders, invoices,
purchase orders, shipping documents, passport applications, credit card payments,
and insurance claims. These data are typically grouped into transactional records,
which include associated master and reference data.

B. Data Warehouse Architecture

Data warehouse architecture defines the arrangement of data and the storing structure.
As the data must be organized and cleansed to be valuable, a data warehouse
architecture centers on identifying the most effective technique of extracting
information from raw data in the staging area and converting it into a simple
consumable structure using a dimensional model that delivers valuable business
intelligence.

When designing a company’s data warehouse, there are three main types of
architecture to take into consideration.

Single-tier architecture

A single-tier data warehouse architecture centers on


producing a dense set of data and reducing the volume of
data deposited. Although it is beneficial for eliminating
redundancies, this architecture is not suitable for businesses
with complex data requirements and numerous data streams.

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,


https://booksite.elsevier.com/9780123743695/10steps_DataCategories.pdf​, ​https://www.astera.com/type/blog/data-warehouse-architecture/
Northlink Technological College
Learning Materials on Media and Information Literacy
Developed by: ​Rusiel Mae A. Silos, LPT
Two-tier architecture

This architecture splits the tangible data sources from the


warehouse itself. Although it is more efficient at data storage and
organization, the two-tier architecture is not scalable. Moreover, it
only supports a nominal number of users.

Three-tier architecture

This is the most common type of data warehouse architecture


as it produces a well-organized data flow from raw information to
valuable insights.

The bottom tier typically comprises the databank server that


creates an abstraction layer on data from numerous sources,
like transactional databanks utilized for front-end uses.

The middle tier includes an ​Online Analytical Processing (OLAP)​server. From a user’s
perspective, this level alters the data into an arrangement that is more suitable for
analysis and multifaceted probing. Since it includes an OLAP server pre-built in the
architecture, we can also call it the OLAP focused data warehouse.

Source: ​
DatawarehouseInfo

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,


https://booksite.elsevier.com/9780123743695/10steps_DataCategories.pdf​, ​https://www.astera.com/type/blog/data-warehouse-architecture/
Northlink Technological College
Learning Materials on Media and Information Literacy
Developed by: ​Rusiel Mae A. Silos, LPT
The third and the topmost tier is the client level which includes the tools and
Application Programming Interface (API) used for high-level data analysis, inquiring,
and reporting.

These are the different types of data warehouse architecture. Now let’s learn about the
elements of a data warehouse (DWH) architecture and how they help build and scale a
data warehouse in detail.

Main Components of Data Warehouse Architecture

Now that we have discussed the three data warehouse architectures, let’s look at the
main constituents of a data warehouse. 1. Data Warehouse Database

The central component of a data warehousing architecture is a databank that stocks all
enterprise data and makes it manageable for reporting. Obviously, this means you
need to choose which kind of database you’ll use to store data in your warehouse.

The following are the four database types that you can use:

● Typical relational databases​which are the row-centered databases you


perhaps use on an everyday basis. For example, Microsoft SQL Server, SAP,
Oracle, and IBM DB2.
● Analytics databases​which are precisely developed for data storage to sustain
and manage analytics. For example, Teradata and Greenplum.
● Data warehouse applications​which aren’t exactly a kind of storage databases,
but several dealers now offer applications that offer software for data
management as well as hardware for storing data. For example, SAP Hana,
Oracle Exadata, and IBM Netezza.
● Cloud-based databases​which can be hosted and retrieved on the cloud so that
you don’t have to procure any hardware to set up your data warehouse. For
example, Amazon Redshift, Microsoft Azure SQL, and Google BigQuery.

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,


https://booksite.elsevier.com/9780123743695/10steps_DataCategories.pdf​, ​https://www.astera.com/type/blog/data-warehouse-architecture/
Northlink Technological College
Learning Materials on Media and Information Literacy
Developed by: ​Rusiel Mae A. Silos, LPT
2. Extraction, Transformation, and Loading (ETL) Tools

Source: https://panoply.io/uploads/etl-1.png

ETL (Extract, Transform, Load) is an automated process which takes raw data,
extracts the information required for analysis, transforms it into a format that can serve
business needs, and loads it to a data warehouse. It summarizes data to reduce its
size and improve performance for specific types of analysis.

ETL tools are central to data warehouse architecture. These tools help with extracting
data from different sources, transforming it into a suitable arrangement, and loading it
into a data warehouse.

The ETL tool you choose will determine: 

● The time expended in data extraction


● Approaches to extracting data
● Kind of transformations applied and the simplicity to do so
● Business rule definition for d
​ ata validation and cleansing​to improve end-product
analytics
● Filling mislaid data
● Outlining information distribution from the fundamental depository to your BI
applications

The benefits of using an ETL Tool

It helps organizations manage their data in several ways. In particular, they excel at
providing the following benefits.

● Scalability – ​Good ETL tools can scale up and down to accommodate the needs
of business users. In some instances, those needs center on huge batch jobs of
big datasets. In others, it could be smaller datasets for exploration.

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,


https://booksite.elsevier.com/9780123743695/10steps_DataCategories.pdf​, ​https://www.astera.com/type/blog/data-warehouse-architecture/
Northlink Technological College
Learning Materials on Media and Information Literacy
Developed by: ​Rusiel Mae A. Silos, LPT
● Real-time – ​ ETL tools are excellent for real-time operations with data.
Competitive tools enable users to specify the rate at which jobs are performed,
which can be every couple of seconds, every five minutes, or any other time
frame to handle low-latent ETL needs.
● Automation – ​ Although some of the automation benefits of ETL tools pertain to
their real-time capabilities, they also apply to less frequent tasks like nightly batch
jobs. With these tools, the ETL process needs to be set up once and then
organization can reuse it at will.
● Governance – ​ Credible ETL tools have governance feature that are highly
important for ensuring data integrity and accuracy. Some of the more important
capabilities include data lineage for regulatory compliance (even down to the
transformation level), metadata management, and lifecycle management.

In the Extract Load Transform (ELT) process, you first extract the data, and then you
immediately move it into a centralized data repository. After that, data is transformed
as needed for downstream use. This method gets data in front of analysts much faster
than ETL while simultaneously simplifying the architecture.

Course Activity 1.1.1

Instructions: ​ Read and write your answers in a one-whole sheet of paper.


1. What is the difference between data warehouse and database? (10 pts)
2. Think of a business and draw its data warehouse architecture.
a. single-tier architecture (5 pts)
b. two-tier architecture (5 pts)
c. three-tier architecture (10 pts)

​Reference: ​https://www.oracle.com/ph/database/what-is-a-data-warehouse/​, ​https://docs.dhis2.org/master/en/implementer/html/aggregate-and-transactional-data.html​,


https://booksite.elsevier.com/9780123743695/10steps_DataCategories.pdf​, ​https://www.astera.com/type/blog/data-warehouse-architecture/
Northlink Technological College
Learning Materials on Media and Information Literacy
Developed by: ​Rusiel Mae A. Silos, LPT

You might also like