You are on page 1of 21

DAVAO DEL NORTE STATE COLLEGE

INSTITUTE OF COMPUTING
Bachelor of Science in Information Technology
New Visayas, Panabo City

IAN VAL P. DELOS REYES, MIT


JOMARIE DAVE B. JAVA

THIS MODULE IS A PILOT TEST ONLY; NOT FOR


REPRODUCTION AND DISTRIBUTION OUTSIDE OF
ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE
STUDENTS WHO ARE OFFICIALLY ENROLLED IN
THE COURSE/SUBJECT.

©2021
COURSE
OVERVIEW
This Course pack is designed for educational administrators, school heads and
teachers.

This was been realized with the initiative of CHED RO XI through its PROJECT
WRITE. Some part of this course pack module was been developed by the
Advance Database Systems Module Team with its corresponding members across
the Colleges and Universities around the region:

• Lanie B. Laureano
• Erwin P. Acedillo
• Michelle Banawan
• Luchi A. Dela Cruz
• Vladimer Kobayashi
• Eric P. Lozarita
• Mary Cris S. Magbanua
• Arwin Bugayong Rañola

This course introduces students to modern database and data management


systems which will be focused on efficient query processing and Indexing
techniques for spatial, temporal, and multimedia databases. Another topic will be
covered is the analysis of large datasets (data mining).

In order for learners to gain competency in this course, this course pack has been
structured into four modules as follows:

• Review of Relational Database Concepts and SQL Commands


• Advance Database Operations
• Advance Database Concepts
• Data Warehouse and Data Mining Concepts

At the end of this course the learners are expected to:

• Recall foundational concepts relational models and relational operators;


• Create and manipulate a database using SQL commands;
• Incorporate efficient transaction design to support data integrity within an
organization’s database management system;
• Design concurrency control methods to address integrity and security
problems within an organization’s database management system;
• Apply proper query formulation in creating SQL queries;
• Design a data warehouse model that addresses the information
requirements and decision-making goals of an organization; and
• Apply appropriate DW and DM algorithms by examining the different
techniques of warehousing and mining data that will support organizational
decisions.

Students in this course are encouraged to go through each lesson in every module
sequentially to maximize their learning. They should work on all exercises as they
build on the concepts of each topic introduced in each lesson.

So, to make this learning experience rewarding for you, study this course pack
with your co-learners at your own pace. You can also ask the help and support of
your peers, classmates and corresponding instructors.

Godspeed!

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
TABLE OF MODULE 01
REVIEW OF RELATIONAL
DATABASE CONCEPTS
AND SQL COMMANDS

CONTENTS 02
10
Lesson 01: SQL Statement Review
Lesson 02: Deepening SQL
Functions

MODULE 02
ADVANCE DATABASE
OPERATIONS
20 Lesson 01: Efficiency of SQL
Indexes
25 Lesson 02: Virtual Tables: Creating
a View
31 Lesson 03: SQL Stored Procedures
37 Lesson 04: Utilizing SQL Triggers

MODULE 03
ADVANCE DATABASE
CONCEPTS
49 Lesson 01: Transaction Management
and Concurrency Control
56 Lesson 02: Foreign Key Constraints
62 Lesson 03: Query Optimization

MODULE 04
DATA WAREHOUSE AND
DATA MINING CONCEPTS
71 Lesson 01: Fundamentals of Data
Warehouse
81 Lesson 02: Business Intelligence
and Data Mining

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FORR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
4
MODULE 04

DATA
WAREHOUSE AND
DATA MINING
CONCEPTS
Business intelligence (BI) is the collection of
best practices and software tools developed
to support business decision making in this
age of globalization, emerging markets,
rapid change, and increasing regulation.
The complexity and range of information
required to support business decisions
has increased, and operational database
structures were unable to support all of
these requirements. Therefore, a new data
storage facility, called a data warehouse,
developed. The data warehouse extracts
its data from operational databases as well
as from external sources, providing a more
comprehensive data pool.

Module Outcomes:
• Discuss the importance of Big
Data
• List the life cycle of Big Data
• Differentiate the different life cycle
of Big Data
• Identify and discuss the phases of
the Big Data life cycle
• Design a data warehouse model
that addresses the information
requirements and decision-making
goals of an organization;
• Design an appropriate DW or Data
mining model for an organization
that will implement the correct
algorithm given the organization’s
strategic goal/s and the nature of
its data

Lessons in the Module:

Lesson 01: Fundamentals of Data


Warehouse
Lesson 02: Business Intelligence and
Data Mining

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
photos/lVZjvw-u9V8
Source: Ayhan, I.E. (2020). Brown wooden hallway with gray metal doors [Photograph]. Unsplash. https://unsplash.com/
MODULE 04: LESSON 01

FUNDAMENTALS OF DATA
WAREHOUSE

Lesson Learning Outcomes:

• Discuss the importance Data Warehousing in the business world;


• Distinguish the difference between OLAP and OLTP; and
• Design a data warehouse model that addresses the information
requirements and decision-making goals of an organization;

Time Frame: Week 12-13

Hello students, welcome to Lesson 1 of Module 4 of Advance Database System!


This lesson discusses the concept of data warehousing and how it helps the
operations of the company specially in decision making.

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE

ACTIVITY Visit the following websites and possibly other URLs:

• https://www.youtube.com/watch?v=AHR_7jFCMeY
• https://www.youtube.com/watch?v=VxJYySJ_c_0

Answer the following:

1. Why Data Warehousing is important?

2. What are the features of data warehousing?

3. List the advantages and disadvantage of data warehousing.

ANALYSIS 1. In your own words, how is a database different from a data warehouse?

2. How data warehouse helps in decision making?

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 72
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE

ABSTRACTION Data Warehouse Defined

A data warehouse is a type of data management system that is designed to enable


and support business intelligence (BI) activities, especially analytics. Data warehouses
are solely intended to perform queries and analysis and often contain large amounts
of historical data. The data within a data warehouse is usually derived from a wide
range of sources such as application log files and transaction applications.

A data warehouse centralizes and consolidates large amounts of data from multiple
sources. Its analytical capabilities allow organizations to derive valuable business
insights from their data to improve decision-making. Over time, it builds a historical
record that can be invaluable to data scientists and business analysts. Because of
these capabilities, a data warehouse can be considered an organization’s “single
source of truth.”

Figure 4.1. Data Warehouse Diagram

What is ETL?
ETL is a process that extracts the data from different source systems, then transforms
the data (like applying calculations, concatenations, etc.) and finally loads the data
into the Data Warehouse system. Full form of ETL is Extract, Transform and Load.

Data Warehouse Applications


Data warehouse helps business executives to organize, analyze, and use their data
for decision making. A data warehouse serves as a sole part of a plan-execute assess
“closed-loop” feedback system for the enterprise management. Data warehouses are
widely used in the following fields:

• Financial services
• Banking services
• Consumer goods
• Retail sectors
• Controlled manufacturing

Types of Data Warehouse

Enterprise Data Warehouse (EDW): Enterprise Data Warehouse


> (EDW) is a centralized warehouse. It provides decision support
service across the enterprise. It offers a unified approach for
organizing and representing data. It also provide the ability to
classify data according to the subject and give access according
to those divisions.

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 73
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE

Operational Data Store: Operational Data Store, which is also


> called ODS, are nothing but data store required when neither Data
warehouse nor OLTP systems support organizations reporting
needs. In ODS, Data warehouse is refreshed in real time. Hence, it
is widely preferred for routine activities like storing records of the
Employees.

Data Mart: A data mart is a subset of the data warehouse. It


> specially designed for a particular line of business, such as sales,
finance, sales or finance. In an independent data mart, data can
collect directly from sources.

Three types of data warehouse applications

Information Processing - A data warehouse allows to process the


1 data stored in it. The data can be processed by means of querying,
basic statistical analysis, reporting using crosstabs, tables, charts,
or graphs.

Analytical Processing - A data warehouse supports analytical


2 processing of the information stored in it. The data can be analyzed
by means of basic OLAP operations, including slice-and-dice, drill
down, drill up, and pivoting.

Data Mining - Data mining supports knowledge discovery by


3 finding hidden patterns and associations, constructing analytical
models, performing classification and prediction. These mining
results can be presented using visualization tools.

Difference Between OLTP and OLAP


OLTP and OLAP both are the online processing systems. OLTP is a transactional
processing while OLAP is an analytical processing system. OLTP is a system that
manages transaction-oriented applications on the internet for example, ATM. OLAP
is an online system that reports to multidimensional analytical queries like financial
reporting, forecasting, etc. The basic difference between OLTP and OLAP is that
OLTP is an online database modifying system, whereas, OLAP is an online database
query answering system.

Figure 4.2. OLTP vs OLAP


Source: Tech Differences (n.d.). [Digital Image].
https://techdifferences.com/difference-between-oltp-
and-olap.html#:~:text=OLTP%20and%20OLAP%20
both%20are,is%20an%20analytical%20processing%20
system.&text=The%20basic%20difference%20
between%20OLTP,online%20database%20query%20
answering%20system.

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 74
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE

Comparison Chart

Figure 4.3. Comparison Chart for OLTP and OLAP


Source: Tech Differences (n.d.). [Digital Image]. https://techdifferences.com/
difference-between-oltp-and-olap.html#:~:text=OLTP%20and%20OLAP%20
both%20are,is%20an%20analytical%20processing%20system.&text=The%20
basic%20difference%20between%20OLTP,online%20database%20query%20
answering%20system.

Data Warehouse Schemas


Schema is a logical description of the entire database. It includes the name and
description of records of all record types including all associated data-items and
aggregates. Much like a database, a data warehouse also requires to maintain a schema.
A database uses relational model, while a data warehouse uses Star, Snowflake, and
Fact Constellation schema.

Star Schema
• Each dimension in a star schema is represented with only one-dimension
table.
• This dimension table contains the set of attributes.
• The following diagram shows the sales data of a company with respect to the
four dimensions, namely time, item, branch, and location.
• There is a fact table at the center. It contains the keys to each of four
dimensions.
• The fact table also contains the attributes, namely dollars sold and units sold.

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 75
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE

Figure 4.4. Star Schema


Source: Tutorials Point (n.d.). [Digital Image]. https://www.tutorialspoint.com/dwh/
dwh_schemas.htm

* Note - Each dimension has only one dimension table and each table holds a set of attributes.
For example, the location dimension table contains the attribute set {location_key, street, city,
province_or_state,country}. This constraint may cause data redundancy. For example, “Vancouver”
and “Victoria” both the cities are in the Canadian province of British Columbia. The entries for such
cities may cause data redundancy along the attributes province_or_state and country

Snowflake Schema
• Some dimension tables in the Snowflake schema are normalized.
• The normalization splits up the data into additional tables.
• Unlike Star schema, the dimensions table in a snowflake schema are normalized.
For example, the item dimension table in star schema is normalized and split
into two dimension tables, namely item and supplier table.

Figure 4.5. Snowflake Schema


Source: Tutorials Point (n.d.). [Digital Image]. https://www.tutorialspoint.com/dwh/dwh_schemas.htm

* Note - Due to normalization in the Snowflake schema, the redundancy is reduced and therefore, it
becomes easy to maintain and the save storage space.

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 76
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE

Fact Constellation Schema


• A fact constellation has multiple fact tables. It is also known as galaxy schema.
• The following diagram shows two fact tables, namely sales and shipping.
• The sales fact table is same as that in the star schema.
• The shipping fact table has the five dimensions, namely item_key, time_key,
shipper_key, from_location, to_location.
• The shipping fact table also contains two measures, namely dollars sold and
units sold.
• It is also possible to share dimension tables between fact tables. For example,
time, item, and location dimension tables are shared between the sales and
shipping fact table.

Figure 4.6. Fast Constellation Schema


Source: Tutorials Point (n.d.). [Digital Image]. https://www.tutorialspoint.com/dwh/dwh_schemas.htm

Data Warehouse Sample Scenario


The university computer lab’s director keeps track of lab usage, as measured by the
number of students using the lab. This function is important for budgeting purposes.
The computer lab director assigns you the task of developing a data warehouse to
keep track of the lab usage statistics. The main requirements for this database are to
:
• Show the total number of users by different time periods.
• Show usage numbers by time period, by major, and by student classification.
• Compare usage for different majors and different semesters.

Complete the following problems:

a. Define the main facts to be analyzed. (Hint: These facts become the source
for the design of the fact table.)
b. Define and describe the appropriate dimensions. (Hint: These dimensions
become the source for the design of the dimension tables.)
c. Draw the lab usage star schema, using the fact and dimension structures you
defined in Problems 1a and 1b.
d. Define the attributes for each of the dimensions in Problem B.
e. Implement your data warehouse design, using the star schema you created in
Problem C and the attributes you defined in Problem D.
f. Create the reports that will meet the requirements listed in this problem’s
introduction.

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 77
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE

Solutions:

A. Facts:

Facts are defined as the numbers or the numeric values that are necessary to
explain the aspects of the business.

For example the numeric figures involved in the sale process of the company.
These figures can be the cost of the product, prices of the products or the
revenue of the total sale which is used in business analysis.

The facts in any company are stored in the tables and it represents the center
of the schema that is the fact table is situated in the center of the schema.

For the inquiry of the management of university computer lab there has to
be the data about the students. That is the quantity of students studying in
the university. The total number of computers available in the university. The
majors and the periods in a day and total in a semester.

Hence for the university schema the main facts are the sum of the number of
students by times, the semester the student the classification and the major.
These facts are used to create the tables on the basis of which the schema of
warehouses is implemented.

B. Dimensions:

The dimensions are defined as the possessions that define the quality of the
schema and they give some more aspects to the fact table.
The dimensions are the most important for the business as the decisions are
made on the basis of dimensions.

For example in the sale department the sale of a product is analyzed in every
region in which it is being sold.

In the university management system the possible dimensions which can be


used in the dimension table are time, semester, classification & major. Each
of the dimension provide source to the total number of students in the fact
table.

C. Star Schema:

It is the schema for any company which is designed by analyzing the facts
and the dimensions of the company.

It is designed in the shape of a star that is the fact table resides in the center
and the tables of the dimensions reside around that fact table.

A Star schema diagram gives a brief idea about the data warehouse.

For the university management system the fact table is the LAB and the
dimensions of this table are time, class, major and the semester. The schema
for the lab is as shown in the Figure 4.7

D. Attributes:

The attributes are the properties of the entities that are the dimensions that
define their structure and characteristics.

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 78
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE

Figure 4.7. Attributes Solution

For example: For inventory the attributes can be inventory_id, product_name,


description, brand, price etc. And for the payroll the attributes can be Salary
date, bonus, annual increment etc.

The attributes of the different dimensions of university management system


are as:

1. Semester dimension: semester_description, semester_time, ending_


date, begin_date.
2. Major dimension: major_name, major_code.
3. Class dimension: class_description, class_id.
4. Time dimension: time_id, time_description, beginning_time, ending_
time

E. Data Warehouse:

A data warehouse is a collection of an organization’s store data. The data


warehouse main intention is on data storage. The way of extracting and
analyzing data to extract, transfer & load data and to manage the data
dictionary part of a data warehousing system.

The data warehouse design drawn using the above star schema diagram is
shown below:

Figure 4.8. Data Warehouse Solution

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 79
Module 04: Lesson 01
IT223 - ADVANCE DATABASE SYSTEM
FUNDAMENTALS OF DATA WAREHOUSE

F. Reports:

In the university schema it is needed to keep the information about the


following:

1. The total number of students in the different times of the day or the
semester. This information can be generated by tables STUDENT,
TIME and LAB.
2. The usages of the computers in the labs by the major and the student
category. This information can be extracted from the tables STURNT,
MAJOR and TIME.
3. The comparison of the majors with reference to the different
semesters. This information can be extracted from the tables MAJOR
and STUDENT.

Hence all the required information for the analysis and decision making can
be generated from the schema of the university lab management. That is the
schema is able to manage all database of the university lab management.

APPLICATION This Application will be done by group, based on the groupings set during the
approval of your Midterm Major Assessment Project Proposals. For complying this
requirement, you must perform the following assessment tasks:

Victoria Ephanor manages a small product distribution company. Because the


business is growing fast, she recognizes that it is time to manage the vast information
pool to help guide the accelerating growth. Ephanor, who is familiar with spreadsheet
software, currently employs a sales force of four people. She asks you to develop a
data warehouse application prototype that will enable her to study sales figures by
year, region, salesperson, and product. (This prototype will be used as the basis for a
future data warehouse database.)

1. Identify the appropriate fact table components.


2. Identify the appropriate dimension tables.
3. Draw a star schema diagram for this data warehouse.
4. Identify the attributes for the dimension tables that will be required to solve
this problem.
5. Implement your data warehouse design, using the star schema you created in
Problem No. 3 and the attributes you defined in Problem No. 4.
6. Create the reports that will meet the requirements listed in this problem’s
introduction.

Save this assessment task on a PDF file with the following format:
• File name - [Subject Title] [Course-year-set] [Last Names (Separated through
commas)] - Module [number] (Lesson [number]) ex. IT223 BSIT3Z Dela
Cruz, Doe, Rizal - Module 04 (Lesson 01).pdf
• Size - Letter (8.5 x 11 inches)
• Orientation - Portrait
• Margin - 1 Inch all sides
• Font - Style: Arial; Size: 10
• Spacing - Single

* Stipulate all of your respective full names, course, year, set, subject enrolled and the title of the
approved project in the first line/s (upper left corner) of the file. Make sure also to optimize your
image/s and crop only the essential parts for it to fit with the maximum file size required on the
LMS submission. Though this assessment was done through group but submission on LMS shall still
be in individual basis. Failure to do the following instructions will invalidate your submission to this
assessment.

> Congratulations! for completing the 1st lesson of the 4th module in Advance Database System

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 80
Source: Nowakowski, A. (2020). Man sitting in front of MacBook Pro [Photograph]. Unsplash. https://unsplash.com/photos/D4LDw5eXhgg

MODULE 04: LESSON 02

BUSINESS INTELLIGENCE AND


DATA MINING

Lesson Learning Outcomes:

• Describe the concepts and techniques in Business Intelligence;


• Explain different data mining techniques; and
• Familiarize the different BI tools used in visualization, analysis, dash-
boarding and reporting.

Time Frame: Week 13-14

Hello students, welcome to Lesson 2 of Module 4 of Advance Database System!


This lesson provides an overview of BI and demonstrates how it facilitates effective
implementation of organizational strategies through better business decision
making.

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
Module 04: Lesson 02
IT223 - ADVANCE DATABASE SYSTEM
BUSINESS INTELLIGENCE AND DATA MINING

ACTIVITY Visit the following websites and possibly other URLs:

• https://www.youtube.com/watch?v=AHR_7jFCMeY

Answer the following:

1. What is Business Intelligence and why it is important?

2. What are sample applications that uses data mining?

ANALYSIS 1. In your own words, how is a database different from a data warehouse?

A. BI convert raw data into meaningful information


B. BI has a direct impact on organization’s strategic, tactical and operational
business decisions.
C. BI tools perform data analysis and create reports, summaries, dashboards,
maps, graphs, and charts
D. All of the above

2. Business intelligence (BI) is a broad category of application programs which


includes _____________

A. Decision support
B. Data Mining
C. OLAP
D. All of the above

3. Which of the following is an essential process in which the intelligent methods are
applied to extract data patterns?

A. Warehousing
B. Data Mining
C. Text Mining
D. Data Selection

4. What are the functions of Data Mining?

A. Association analysis
B. Classification analysis
C. Regression analysis
D. Clustering analysis
E. All of the above

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 82
Module 04: Lesson 02
IT223 - ADVANCE DATABASE SYSTEM
BUSINESS INTELLIGENCE AND DATA MINING

ABSTRACTION What is business intelligence?

Business Intelligence (BI) is a set of processes, architectures, and technologies that


convert raw data into meaningful information that drives profitable business actions.
It is a suite of software and services to transform data into actionable intelligence and
knowledge.

BI has a direct impact on organization’s strategic, tactical and operational business


decisions. BI supports fact-based decision making using historical data rather than
assumptions and gut feeling.

BI tools perform data analysis and create reports, summaries, dashboards, maps,
graphs, and charts to provide users with detailed intelligence about the nature of the
business.

Figure 4.9. Business Intelligence Visualization using MS PowerBI

Business Intelligence Techniques


There are several business intelligence techniques companies can put to use to gain
valuable insights to inform decision-making. Here’s a look at the most common BI
techniques.

Analytics - Analytics is a business intelligence technique that


1 involves the study of available data to extract meaningful insights
and trends. This is a popular BI technique since it lets businesses
deeply understand the data they have and drive ultimate value
with data-driven decisions. For instance, a marketing organization
can use analytics to establish the customer segments that are
highly likely to convert to new customers, and call centers leverage
speech analytics to monitor customer sentiment, improve the
customer experience, and for quality assurance purposes, just to
name a few.

Predictive Modeling - Predictive modeling is a BI technique that


2 utilizes statistical techniques to create models that could be used
in forecasting probabilities and trends. With predictive modeling,
it is possible to predict the value for a particular data item as well
as the attributes using multiple statistical models.

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 83
Module 04: Lesson 02
IT223 - ADVANCE DATABASE SYSTEM
BUSINESS INTELLIGENCE AND DATA MINING

OLAP - Online analytical processing is a technique for solving


3 analytical problems with different dimensions. The most important
value in OLAP is its multidimensional aspect that lets users identify
problems from different perspectives. OLAP could be used to
complete tasks such as budgeting, CRM data analysis, and financial
forecasting.

Data Mining - Data mining is a technique for discovering patterns


4 in huge datasets and often incorporates database systems,
statistics, and machine learning to find these patterns. Data mining
is an integral process for data management as well as the pre-
processing of data since it ensures appropriate data structuring.
End users could also use data mining to create models that reveal
these patterns. For instance, a business could mine CRM data
to predict which leads will most likely buy a certain solution or
product.

Model Visualization - The model visualization technique is used to


5 transform the discovered facts into histograms, plots, charts and
other visuals that aid in proper interpretation of the insights.

What are BI Tools?


BI tools are types of software used to gather, process, analyze, and visualize large
volumes of past, current, and future data in order to generate actionable business
insights, create interactive reports, and simplify the decision-making processes.
These tools include key features such as data visualization, visual analytics, interactive
dashboarding and KPI scorecards. Additionally, they enable users to utilize automated
reporting and predictive analytics features based on self-service.

What is Big Data?


The term big data can be defined simply as large data sets that outgrow simple
databases and data handling architectures. For example, data that cannot be easily
handled in Excel spreadsheets may be referred to as big data.

Big data involves the process of storing, processing and visualizing data. It is essential
to find the right tools for creating the best environment to successfully obtain valuable
insights from your data.

Setting up an effective big data environment involves utilizing infrastructural


technologies that process, store and facilitate data analysis. Data warehouses,
modeling language programs and OLAP cubes are just some examples. Today,
businesses often use more than one infrastructural deployment to manage various
aspects of their data.

Big data often provides companies with answers to the questions they did not know
they wanted to ask: How has the new HR software impacted employee performance?
How do recent customer reviews relate to sales? Analyzing big data sources illuminates
the relationships between all facets of your business.

Therefore, there is inherent usefulness to the information being collected in big data.
Businesses must set relevant objectives and parameters in place to glean valuable
insights from big data.

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 84
Module 04: Lesson 02
IT223 - ADVANCE DATABASE SYSTEM
BUSINESS INTELLIGENCE AND DATA MINING

Examples of Big Data

• The New York Stock Exchange generates about one terabyte of new
trade data per day.
• Social Media
The statistic shows that 500+terabytes of new data get ingested into the
databases of social media site Facebook, every day. This data is mainly
generated in terms of photo and video uploads, message exchanges,
putting comments etc.

What is data mining?


Data Mining is the art and science of discovering useful innovative patterns from data.
There is a wide variety of patterns that can be found in the data. Data mining exposes
patterns in massive datasets that can provide valuable business intelligence.

Data Mining Techniques

Classification - it divides large datasets into specific categories.


1 One of the most common uses of classification is filtering emails
into “spam” or “non-spam.” Classification is a form of “pattern
recognition”.

Association - it is a technique to uncover how items are associated


2 to each other. Association rules are “if-then” statements, that help
to show the probability of relationships between data items, within
large data sets in various types of databases. “If a customer buys
bread, he’s 70% likely of buying milk.”

Clustering - the task of grouping a set of objects in such a way


3 that objects in the same group (called a cluster) are more similar to
each other than to those in other groups (clusters). Clustering helps
in identification of groups of houses on the basis of their value,
type and geographical locations. It also used to study earthquake
based on the areas hit by an earthquake in a region, clustering
can help analyze the next probable location where earthquake can
occur.

Regression - is used to identify the likelihood of a certain variable,


4 given the presence of other variables. It is a way to find trends in
data. For example, “What would be the total number of COVID
positive cases in the next 3 months, based on the 12 months
recorded COVID positive cases?”. More specifically, regression’s
main focus is to help you uncover the exact relationship between
two (or more) variables in a given data set.

Business Intelligence vs. Data Mining


Business intelligence and data mining differ in a few core ways. Namely, in purpose,
volume, and results. The purpose of business intelligence is to convert data into useful
information for executives. Business intelligence tracks key performance indicators
and presents data in a way that encourages data-driven decisions. By contrast, data
mining is geared towards exploring data and finding solutions to particular business
issues. Data mining has the computational intelligence and algorithms to detect
patterns that are interpreted and presented to management via business intelligence.

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 85
Module 04: Lesson 02
IT223 - ADVANCE DATABASE SYSTEM
BUSINESS INTELLIGENCE AND DATA MINING

In that same vein, data mining is most optimal for processing datasets concentrated
on a particular department, customer segment, or competitor(s). By analyzing these
smaller datasets, data mining can reveal hidden answers to specific business questions.
Unlike the specificity of data mining, business intelligence processes dimensional or
relational databases in order to deduce how an enterprise is performing on the whole.

Since data mining is more oriented towards getting data into a usable format and
resolving unique business problems, the results of data mining are unique datasets.
Conversely business intelligence results are presented in charts, graphs, dashboards,
and reports. Displaying BI results is vital to influence data-driven decisions.

Lastly, data mining and business intelligence differ in their focus. Studying patterns
through data mining helps companies develop new KPIs for business intelligence.
Business intelligence is therefore focused on showing progress towards data mining-
defined KPIs. Broad metrics like total revenue, total customer support tickets, and
ARR over time paint a holistic picture of company performance and give stakeholders
the confidence to make significant decisions.

Feature Data Mining Business Intelligence


Exploring and formatting data to find Interpreting and presenting data to
Purpose answers to business problems stakeholders to inform data-driven
decisions
Processes small, specific datasets for Processes relational databases to track
Volume
focused analysis enterprise-level metrics
Unique datasets in a usable data Dashboards, graphs, charts, reports
Results
format
Focus Identifying new KPIs Demonstrating KPI progress

Table 4.1. Difference between Business Intelligence and Data Mining

APPLICATION Look for a partner within your section/set and answer the following questions:

1. List at least 2 problems or scenarios that uses classification, association,


clustering and regression techniques.
2. How do business intelligence and big data relate and compare?
3. How data mining and business intelligence work together?
4. List at least five (5) BI tools and describes its features.

Save this assessment task on a PDF file with the following format:
• File name - [Subject Title] [Course-year-set] [Last Name/s] - Module [number]
(Lesson [number]) ex. IT223 BSIT3Z Dela Cruz & Doe - Module 04 (Lesson
02).pdf
• Size - Letter (8.5 x 11 inches)
• Orientation - Portrait
• Margin - 1 Inch all sides
• Font - Style: Arial; Size: 10
• Spacing - Single

* Stipulate your full name/s (arranged in ascending order), course, year, set and subject enrolled in
the first line/s (upper left corner) of the file. Make sure also to optimize your image/s and crop only
the essential parts for it to fit with the maximum file size required on the LMS submission. Tough this
assessment was done through by-pair but submission on LMS shall still be in individual basis.
Failure to do the following instructions will invalidate your submission to this assessment.

> Congratulations! for completing the 2nd lesson of the 4th module in Advance Database System

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 86
MODULE
SUMMARY
• Transaction Management is a sequence of many actions that are being performed by a single
user or application program, which reads or updates the contents of the database.
• There are properties that all transactions should follow and possess. The four basic are in
combination termed as ACID properties.
• Concurrency Control in Database Management System is a procedure of managing simultaneous
operations without conflicting with each other. It ensures that Database transactions are
performed concurrently and accurately to produce correct results without violating data
integrity of the respective Database.
• Foreign key is a column (or combination of columns) in a table that points to the primary key of
another table.
• Foreign key ensures values in one table must be present in another table.
• SQL Query optimization is a process of writing thoughtful SQL queries to improve database
performance.

REFERENCES
KidOstrichPerson7 (n.d). DBMS Concurrency Control. Retrieved March 12, 2021, from https://www.
coursehero.com/file/64141989/DBMS-Concurrency-Controldocx/
Mode Analytics, Inc (n.d.). Performance Tuning SQL Queries. Retrieved March 30, 2021 from https://
mode.com/sql-tutorial/sql-performance-tuning/.
Singh, A. (2020). Top 10 SQL Query Optimization Tips to Improve Database Performance. Retrieved
March 30, 2021 from https://www.mantralabsglobal.com/blog/sql-query-optimization-
tips/#:~:text=SQL%20Query%20optimization%20is%20a,to%20the%20queries%20they%20write

THIS MODULE IS A PILOT TEST ONLY; NOT FOR REPRODUCTION AND DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
THIS IS INTENDED ONLY FOR THE USE OF THE STUDENTS WHO ARE OFFICIALLY ENROLLED IN THE COURSE/SUBJECT.
DNSC IC | Page 87

You might also like