You are on page 1of 12

ARBA MINCH UNIVERSITY

INSTITUTE OF TECHNOLOGY
Assignment of Data Mining and Data Warehouse
DEPARTMENT OF COMPUTING AND SOFTWARE ENGINEERING
Section G5SE, A
Group members:

NAME ID

1. Wondu Gashahun-----------------------------------RAMIT/1858/10

2. Etsubdink Ambel-----------------------------------RAMIT/1647/10

3. Fekadu Debelu--------------------------------------RAMIT/1654/10

4. Hana Tesfaye----------------------------------------RAMIT/1688/10

5. Yohannis Baye---------------------------------------RAMIT/1872/10

Submitted to: Assist. Prof. Addisu M. Submission date: March 28, 2022 GC
6. What is the difference between:-

a. Data warehouse and data base


b. Data warehouse and data marts
c. OLAP and OLTP

A. What is the difference between data warehouse and data base?

Parameter Database Data Warehouse


Purpose Is designed to record Is designed to analyze
Data warehouse uses Online Analytical
The database uses the Online
Processing Processing (OLAP).
Transactional Processing (OLTP).
Method It supports analysis and performance
It supports operational processes.
reporting.
The database helps to perform Data warehouse allows you to analyze
Usage
fundamental operations for your business your business.
Table and joins are simple in a data
Tables and joins of a database are
Tables & Joins warehouse because they are
complex as they are normalized.
denormalized.
It is an application-oriented collection of
Orientation It is a subject-oriented collection of data.
data.
Stores data from any number of
Storage limit Generally limited to a single application
applications
Data is refreshed from source systems as
Availability Data is available real-time.
and when needed.
ER modeling techniques are used for Data modeling techniques are used for
Usage
designing. designing.
Technique Capture and maintain the data. Analyze or explore the data.
Current and Historical Data is stored in
Data stored in the Database is up to date
Data Type Data Warehouse. May not be up to date
or Current data.
or multiple years of data.
Data Ware House uses dimensional and
Flat Relational Approach method is used normalized approach for the data
Storage of data
for data storage. structure.
Example: Star and snowflake schema.
Complex queries are used for analysis
Query Type Simple transaction queries are used.
purpose.
Detailed Data is stored in a database. It stores highly summarized data.
Data Summary
Primitive and highly detailed. Summarized and consolidated.
Size limit 100 MB to GB. 100 GB to TB.

1
Applications of Database
Sector Usage
Use in the banking sector for customer information, account-related activities, payments,
Banking
credit cards, etc.
Airlines Use for reservations and schedule information.
Universities To store student information, course registrations, colleges, and results.
Telecommunication It helps to store call records, monthly bills, balance maintenance, etc.
Finance Helps you to store information related stock, sales, and purchases of stocks and bonds.
Sales & Production Use for storing customer, product and sales details.
Manufacturing It is used for the data management of the supply chain and for tracking production of item
HR Management Detail about employee’s salaries, deduction, generation of paychecks, etc.

B. What is the difference between data warehouse and data marts?

A data mart is a subset of a data warehouse oriented to a specific business line. Data marts conta
repositories of summarized data collected for analysis on a specific section or unit within an organization, f
example, the sales department.

A data warehouse is a large centralized repository of data that contains information from many sources
within an organization. The collated data is used to guide business decisions through analysis, reporting, an
data mining tools.

Figure 1 Data warehouse and Data mart

2
Difference between Data Warehouse and Data Mart

Parameter Data warehouse Data mart


Collects data from various data Data mart generally stores data from
sources. a data warehouse only.
It is a large repository of data collected It is an only subtype of a Data
from different organizations or Warehouse.
Definition departments within a corporation.
Centralize data, become single source Provide easy access to data for a
of truth across business or centralized department or specific line of
system. business or decentralized system.
To become a single source of all data To become a source of a particular
Objective
needed. subject only, like of a particular
Provide an integrated environment and department.
coherent picture of the business at a A data mart mostly used in a business
point in time. division at the department level.
Data warehousing is broadly focused Data Mart is subject-oriented, and it
Focus all the departments. It is possible that it is used at a department level.
can even represent the entire company.
Uses Business-wide analysis Department-specific analysis
Operational or tactical decision-
Decision types Strategic decision-making
making
Wide; contains data from all Specific; individual data marts for
Scope
departments and lines of business individual departments
Typically more than 100GB or to 1
Size Less than 100GB
TB+
Data warehouse is typically enterprise- Takes less time to process data, as it
wide and ranges across multiple areas. only handles a small amount of data;
Data warehousing includes large area typically summarized data
of the corporation which is why it takes Data mart is limited to a single focus
Data held and
a long time to process it. for one line of business.
/ or Range
All organizational data, because of Data marts are easy to use, design
large data, like raw data, metadata, and and implement as it can only handle
summary data. It takes a long time to small amounts of data.
process the data.
Data sources Dozens or hundreds. Typically just a few.
Data warehouse stores data from Data mart includes data from just a
multiple sources. few sources.

3
Enterprise-wide repository of disparate A single subject or functional
data sources organization area
Weeks to months (on-premises); days
Time to Months to years (on-premises); days to
to weeks (cloud-based) or has short
implement weeks (cloud-based) or has long life.
life than warehouse.
$100K+ (on-premises); on-demand $10K (on-premises); on-demand
Cost
pricing varies (SaaS) pricing varies (SaaS)
Dimensional modeling and star
Designed to store enterprise-wide schema design are employed for
Data storing
decision data. optimizing the performance of the
access layer
Data Warehouse is the data-oriented in While it is the project-oriented in
Nature nature to store all of the data related to nature stores data specific to a
the company. particular department only.
The Design Process of creating The Design process of creating
Design schemas and views in a Data schemas and views in a Data mart is
warehouse is complicated or quite easy.
difficult.
Provide a coherent picture of all the Mostly hold only one subject area-
departments in the business at a point for example, Sales department,
in time. Finance department or Sales figure.
Subject-area Provide an integrated environment and
coherent picture of the business at a
point in time.
Time variance and non-volatile design Data Marts are built for particular
are strictly enforced. user groups. Therefore, data short and
Data type The data stored inside the Data limited and in a summarized form.
Warehouse are always detailed when Mostly includes consolidation data
compared with data mart. structures to meet subject area’s
query and reporting needs.
In data warehouse, lightly While in Data mart, highly
denormalization takes place. denormalization takes place.
Normalization Modern warehouses are mostly No preference between a normalized
denormalized for quicker data querying and denormalized structure.
and read performance
Building To build a warehouse is difficult. While to build a mart is easy.
Schema used In data warehouse, Fact constellation While in this, Star schema and
schema is used. snowflake schema are used.
Flexibility Data Warehouse is flexible. While it is not flexible.

4
Data warehouse is top-down model. While it is a bottom-up model.
First data is fetched in a data Data marts are created first and then
warehouse, and then from them data from them data warehouses are
marts are created. This type of created. This type of strategy works
Model approach is generally followed and best best for small to medium-sized
for big companies since they have more companies.
data as compared to small to medium-
sized companies.
The approach selection is based on the
company size.
Transaction data regardless of grain
Read-Only from the end-users
Data value fed directly from the Data
standpoint.
Warehouse.

C. What is the difference between OLAP and OLTP?

Criteria OLTP OLAP


Basic It is an online transactional system It is an online data retrieving and
and manages database data analysis system.
modification.
Handles a large number of small Handles large volumes of data with
Characteristics
transactions complex queries
Query types Simple standardized queries Complex queries
Based on INSERT, UPDATE, Based on SELECT commands to
Operations
DELETE commands aggregate data for reporting
The processing time of a transaction
The processing time of a
is comparatively more in OLAP.
transaction is comparatively less in
Response time Seconds, minutes, or hours
OLTP.
depending on the amount of data to
Milliseconds
process
Industry-specific, such as retail, Subject-specific, such as sales,
Design
manufacturing, or banking inventory, or marketing
Source Transactions Aggregated data from transactions
Plan, solve problems, support
Control and run essential business
decisions, and discover hidden
Purpose operations in real time.
insights.
Process transactions quickly.
Business intelligence or reporting.
Data updates or Data periodically refreshed with
Short, fast updates initiated by user
Performance scheduled, long-running batch jobs

5
Generally small if historical data is
archived. Generally large due to aggregating
Space will depend on the number large datasets
Space requirements of transactions to be processed and
the length of online storage.
Generally smaller than OLAP if
historical data is archived.
Regular backups required to ensure Lost data can be reloaded from
Backup and recovery business continuity and meet legal OLTP database as needed in lieu of
and governance requirements regular backups
Increases productivity of business
Productivity Increases productivity of end users managers, data analysts, and
executives
Lists day-to-day business Multi-dimensional view of enterprise
Data view
transactions data
Knowledge workers such as data
Customer-facing personnel, clerks,
User examples analysts, business analysts, and
online shoppers
executives
Normalized databases for
Denormalized databases for analysis.
Database design or efficiency.
Tables in OLAP database are not
Normalization Tables in OLTP database are
normalized.
normalized (3NF).
Data OLTP and its transactions are the Different OLTPs database becomes
original source of data. the source of data for OLAP.
Transaction OLTP has short transactions. OLAP has long transactions.
Integrity OLTP database must maintain data OLAP database does not get
integrity constraint. frequently modified. Hence, data
integrity is not affected.
Availability Generally, 24x7x365 is essential Interactions are less frequent; the
when transactions are performed absence of an OLAP system should
every second of every day. not impact operations.
Use Case Examples Operational: Informational:
Applications used concurrently by Trend analysis and data patterns,
many users, such as order entry, predicting risks and outcomes,
financial transactions, customer generating reports, and tracking
relationship management, and customer behavior and buying
retail sales. patterns.
Examples are online ticket Examples include creating sales and
bookings, banking, e-commerce marketing reports, preparing
websites, fintech, and other forecasts, and business process

6
businesses where there are management.
thousands or millions of
transactions per day.

What is OLTP?
Online Transaction Processing (OLTP) is a processing tool that manages the transaction
data. OLTP administers day to day transaction of an organization.
Examples – Uses of OLTP are as follows:
 ATM center is an OLTP application.
 OLTP handles the ACID properties during data transaction via the application.
 It’s also used for Online banking, Online airline ticket booking, sending a text
message, add a book to the shopping cart.
 Customer uses a credit card for various transactions and this data of transaction is
maintained by the banks. The database has multiple fields for every transaction.
Therefore, Organisations use OLTP to maintain the database for transaction data.

In other words, OLTP is used in transaction-oriented applications. Many organizations


like banking, retail use OLTP software. The main advantage of OLTP is that it handles
many transactions in a single time.

Advantages of OLTP
 It is a user-friendly application. Anyone can use this application.
 OLTP can quickly perform actions like reading, modifying, and deleting the data.
 It can process the query faster. Hence, it responds to user action quickly.
 OLTP generates the data.
 OLTP applications are built to perform business tasks.

Challenges of OLTP
 Multiple users can use the OLTP applications simultaneously. The data processed
from different systems should not overlap with each other. Therefore,
concurrency control is necessary.
 If the systems fail to operate new system should run the application at the same
point without corrupting the data.
 OLTP is used only for the transaction of data. It cannot analyze the data. In other
words, it does not have the ability to make decisions.

Characteristics of OLTP
 It transacts a small amount of data.
 OLTP indexes the data so that it can be accessed easily.
 Many users can use the OLTP application simultaneously.
 It performs quick actions on the queries.

7
 It only performs predefined functions on the data.
 OLTP stores the data in the database.
 It is user-friendly.

Example of OLTP

When we go to an ATM to withdraw money a simple transaction process is done by the


bank. But in the case of a joint account if both the members try to withdraw the total
amount of money at the same time. Then, the person who first completes the process will
get the amount. Here OLTP makes sure that the amount inserted by the person should not
be more than the total amount.

There are many examples of OLTP such as:

 Making an online transaction by using online banking services.


 Booking of Railway tickets online.
 Online shopping on e-commerce websites.
 Sending mails on mailing tools.
OLTP and the types of queries

OLTP supports databases queries. It is an online database management system. It can


update, insert and delete the data with the help of database queries. But OLTP does not
support all queries. It does not support queries that have a decision-making task.

Queries that OLTP can process:-

 It can send the complete information of the product to the user.


 It can apply filters related to different categories. For example, If you want to see
the product from a particular supplier, OLTP can process that query.
 OLTP can give the details of a particular customer.
 It can also run a query that can list the products according to their amount.

Queries that OLTP does not support:-

 OLTP cannot decide the amount of discount it can offer for a particular item.
 It does not support the query of which product must be highlighted to the
customers.
What is OLAP?

Online Analytical Processing (OLAP) is a software tool that allows organizations in


analyzing the data from various databases at the same time.

8
OLAP provides an environment to get insights from the database retrieved from
multiple database systems at one time.
Examples – Any type of Data warehouse system is an OLAP system. Uses of OLAP
are as follows:
 Spotify analyzed songs by users to come up with the personalized homepage of
their songs and playlist.
 Netflix movie recommendation system.

OLAP works on a hypercube system which means the databases are split into multiple
cubes. The cube is a multidimensional database model.

OLAP cube collects and stores the data. Normally the data is stored in a two-dimensional
database. It is stored in a database table in rows and columns format. But OLAP stores
the data in multi-dimensional databases. It stores the data in an orderly manner.

OLAP Operations

The data is stored in a cube also known as a hypercube. To perform actions on these
cubes or databases there are different operations. The 5 OLAP operations are as follows.

 Roll-up: It is completely the opposite of the Drill down operation. This operation
creates the collection of data in OLAP cube. There are two different ways to
perform this operation.
o Moving up the concept-hierarchy. For instance, If the data is stored in a
City dimension, then moving it to the Country dimension the details
change.
o Removing one of the existing dimensions.
 Drill Down: Drill Down operation is the opposite of Roll-up. It shows the
detailed information of the cube. This can be done by applying two processes.
o Moving the concept-hierarchy down. For instance, If the data is stored in a
quarter dimension, then moving it to the month dimension the details of
data change.
o The second process is to add another new dimension to the cube.
o For example, fewer details will appear if we access the data in a quarterly
period. But if we access the data in a monthly period, it will show you
more details of the data.
 Slice: In this operation, a sub-cube is created by selecting one dimension. It slices
the dimensions to create sub cubes.
o For instance, the Country dimension is Sliced with C1.
o A new sub-cube is created.

9
 Dice: It is similar to the Slice operation. In this Operation, a sub-cube is created
by selecting two or more dimensions. For instance, location “Sydney” and “New
Jersey” is selected and diced.
 Pivot: In Pivot, we change the axes of the data to provide a different view of the
database. It rotates the data axes.
Different Types of OLAP Systems

The OLAP is distributed into three main types.

ROLAP

The ROLAP systems work with Relational databases. It stores the data is in relational
tables. ROLAP has the capability to analyze the data of the multidimensional database.

Advantages of ROLAP

 While working with multidimensional data analysis, ROLAP has a faster query
performance and optimized access language. Therefore, data efficiency is higher
while using ROLAP systems.
 ROLAP is a Scalable system as it can manage large amounts of data. It also works
seamlessly even when the data is increasing.

Disadvantages of ROLAP

 ROLAP system requires higher software and infrastructures and also large
manpower to handle these resources.
 ROLAP query performance is low as compared to MOLAP.

MOLAP

The MOLAP systems work with multidimensional databases. These systems use array-
based storage engines. In other words, MOLAP uses the OLAP hypercube concept.

Advantages of MOLAP

 MOLAP can analyze a large amount of multidimensional data.


 The size of the data is smaller as compared to the data of the relational databases.
 It performs faster query performance.
 MOLAP can manage a large amount of less-defined data.
 Organizations perform Slicing and Dicing operations on these systems.

10
Disadvantages of MOLAP

 MOLAP manages limited data. Therefore, it is less scalable.


 This system is cannot contain detailed data. The cube cannot store a large amount
of data.

HOLAP

HOLAP or Hybrid OLAP is the mixture of both MOLAP and ROLAP systems. This
gives HOLAP the important features of both systems. It takes the scalable feature of
ROLAP and the faster processing speed of MOLAP. Hybrid OLAP uses two databases.
Organizations use a multidimensional database to store computed data. Whereas, they use
a relational databases to store detailed large data.

Advantages of HOLAP

 It is convenient to use as it is compact and saves disk space.


 It uses OLAP hypercube concept. Hence, it can process all types of data faster.

Disadvantages of HOLAP

 It uses both the systems ROLAP and MOLAP. Therefore, the system becomes
more complex.
 Both these systems have their own functions. Therefore, they might overlap with
each other.

11

You might also like