Professional Documents
Culture Documents
Good day. How are you? I hope you are feeling great at home. Welcome to the
second term of this new normal. In this session, you will understand the concept of
data warehousing.
At the end of this session, you are expected to be able to:
1. define the data warehousing
2. draw the three types of data architecture
Please be guided with the following:
€ First, read Information Sheet 1.1.1 – Data Warehousing
a. Aggregate Data and Transactional Data
b. Data Warehouse Architecture
€ Then, perform Course Activity 1.1.1
Note: Acquiring new Learning Material is possible after returning the previous one.
Any updates will be announced through TEXT or Facebook group page or Google
Classroom)
Just in case you have questions regarding the lesson content, do not hesitate to send it
through our GROUP CHATBOX or thru text messaging at 09171355341.
Data Warehousing
It is a repository that includes past and commutative information from one or multiple
sources. It streamlines reporting and BI processes of businesses. Instead of
processing transactions, a data warehouse works as a relational database and
performs querying and analysis.
Data warehouses are solely intended to perform queries and analysis and often
contain large amounts of historical data. The data within a data warehouse is usually
derived from a wide range of sources such as application log files and transaction
applications.
A data warehouse centralizes and consolidates large amounts of data from multiple
sources. Its analytical capabilities allow organizations to derive valuable business
insights from their data to improve decision-making.
In general, fast query performance with high data throughput is the key to a successful
data warehouse.
To further illustrate the concept of aggregate data and how it may be used in public
education, consider a school with an enrollment of 500 students, which means the
school maintains 500 student records, each of which contains a wide variety of
information about the enrolled students—for example, first and last name, home
address, date of birth, gender identification, race or ethnicity, date and period of
enrollment, courses taken and completed, course-grades earned, test scores, etc. (the
information collected and maintained on individual students is often
called student-level data , among other terms). Once or twice a year, the school
district may be required to submit student-enrollment reports to their state department
of education. Each school in the district will then compile a report that documents the
number of students currently enrolled in the school and in each grade level, which
requires administrators to summarize data from all their individual student records to
produce the enrollment reports. The district now has aggregate enrollment information
about the students attending its schools. Over the next five years, the school district
could use these annual reports to analyze increases or declines in district-wide
enrollment, enrollment at each school, or enrollment at each grade level. The district
could not, however, determine whether there have been increases or declines in the
enrollment of white and non-white students based on the aggregate data it received
from its schools. To produce a report showing distinct enrollment trends for different
races and ethnicities, the district schools would then need to disaggregate the
enrollment information by racial and ethnic subgroups.
Transactional data describe an internal or external event or transaction that takes place
as an organization conducts its business. Examples include sales orders, invoices,
purchase orders, shipping documents, passport applications, credit card payments,
and insurance claims. These data are typically grouped into transactional records,
which include associated master and reference data.
Data warehouse architecture defines the arrangement of data and the storing structure.
As the data must be organized and cleansed to be valuable, a data warehouse
architecture centers on identifying the most effective technique of extracting
information from raw data in the staging area and converting it into a simple
consumable structure using a dimensional model that delivers valuable business
intelligence.
When designing a company’s data warehouse, there are three main types of
architecture to take into consideration.
Single-tier architecture
Three-tier architecture
The middle tier includes an Online Analytical Processing (OLAP)server. From a user’s
perspective, this level alters the data into an arrangement that is more suitable for
analysis and multifaceted probing. Since it includes an OLAP server pre-built in the
architecture, we can also call it the OLAP focused data warehouse.
Source:
DatawarehouseInfo
These are the different types of data warehouse architecture. Now let’s learn about the
elements of a data warehouse (DWH) architecture and how they help build and scale a
data warehouse in detail.
Now that we have discussed the three data warehouse architectures, let’s look at the
main constituents of a data warehouse. 1. Data Warehouse Database
The central component of a data warehousing architecture is a databank that stocks all
enterprise data and makes it manageable for reporting. Obviously, this means you
need to choose which kind of database you’ll use to store data in your warehouse.
The following are the four database types that you can use:
Source: https://panoply.io/uploads/etl-1.png
ETL (Extract, Transform, Load) is an automated process which takes raw data,
extracts the information required for analysis, transforms it into a format that can serve
business needs, and loads it to a data warehouse. It summarizes data to reduce its
size and improve performance for specific types of analysis.
ETL tools are central to data warehouse architecture. These tools help with extracting
data from different sources, transforming it into a suitable arrangement, and loading it
into a data warehouse.
It helps organizations manage their data in several ways. In particular, they excel at
providing the following benefits.
● Scalability – Good ETL tools can scale up and down to accommodate the needs
of business users. In some instances, those needs center on huge batch jobs of
big datasets. In others, it could be smaller datasets for exploration.
In the Extract Load Transform (ELT) process, you first extract the data, and then you
immediately move it into a centralized data repository. After that, data is transformed
as needed for downstream use. This method gets data in front of analysts much faster
than ETL while simultaneously simplifying the architecture.