You are on page 1of 14

Difference between Data Warehousing and Data Mining

A data warehouse is built to support management functions whereas data mining is used to extract useful information and
patterns from data. Data warehousing is the process of compiling information into a data warehouse.

Data Warehousing:

It is a technology that aggregates structured data from one or more sources so that it can be compared and analyzed
rather than transaction processing. A data warehouse is designed to support the management decision-making process by
providing a platform for data cleaning, data integration, and data consolidation. A data warehouse contains subject-
oriented, integrated, time-variant, and non-volatile data. The Data warehouse consolidates data from many sources while
ensuring data quality, consistency, and accuracy. Data warehouse improves system performance by separating analytics
processing from transnational databases. Data flows into a data warehouse from the various databases. A data warehouse
works by organizing data into a schema that describes the layout and type of data. Query tools analyze the data tables
using schema.
For example, a data warehouse might combine customer information from an organization's point-of-sale
systems, its mailing lists, website, and comment cards.

Figure: Data Warehousing process

Advantages of Data Warehousing:

 The data warehouse’s job is to make any form of corporate data easier to understand. The majority of the user’s job
will consist of inputting raw data.
 The capacity to update continuously and frequently is the key benefit of this technology. As a result, data warehouses
are perfect for organizations and entrepreneurs who want to stay current with their target audience and customers.
 It makes data more accessible to businesses and organizations.
 A data warehouse holds a large volume of historical data that users can use to evaluate different periods and trends in
order to create predictions for the future.

Disadvantages of Data Warehousing:

 There is a great risk of accumulating irrelevant and useless data. Data loss and erasure are other potential issues.
 Data is gathered from various sources in a data warehouse. Cleansing and transformation of the data are required. This
could be a difficult task.

Data Mining:

It is the process of finding patterns and correlations within large data sets to identify relationships between data. Data
mining tools allow a business organization to predict customer behavior. Data mining tools are used to build risk models
and detect fraud. Data mining is used in market analysis and management, fraud detection, corporate analysis, and risk
management.
For example in the medical field
Data mining enables more accurate diagnostics. Having all of the patient's information, such as medical records,
physical examinations, and treatment patterns, allows more effective treatments to be prescribed. It also enables more
effective, efficient and cost-effective management of health resources by identifying risks, predicting illnesses in
certain segments of the population or forecasting the length of hospital admission. Detecting fraud and irregularities,
and strengthening ties with patients with an enhanced knowledge of their needs are also advantages of using data
mining in medicine.

Figure: Data Mining process

Advantages of Data Mining:

 Data mining aids in a variety of data analysis and sorting procedures. The identification and detection of any undesired
fault in a system is one of the best implementations here. This method permits any dangers to be eliminated sooner.
 In comparison to other statistical data applications, data mining methods are both cost-effective and efficient.
 Companies can take advantage of this analytical tool by providing appropriate and easily accessible knowledge-based
data.
 The detection and identification of undesirable faults that occur in the system are one of the most astonishing data
mining techniques.

Disadvantages of Data Mining:

 Data mining isn’t always 100 percent accurate, and if done incorrectly, it can lead to data breaches.
 Organizations must devote a significant amount of resources to training and implementation. Furthermore, the
algorithms used in the creation of data mining tools cause them to work in different ways.
Comparison between Data Mining and Data Warehousing:

S. Basis of
No. Comparison Data Warehousing Data Mining

A data warehouse is a
database system that is
designed for analytical
analysis instead of Data mining is the process of
1. Definition transactional work. analyzing data patterns.

2. Process Data is stored periodically. Data is analyzed regularly.

Data warehousing is the


process of extracting and
storing data to allow Data mining is the use of pattern
3. Purpose easier reporting. recognition logic to identify patterns.

Data mining is carried out by


Managing
Data warehousing is solely business users with the help of
4. Authorities carried out by engineers. engineers.

Data warehousing is the Data mining is considered as a


process of pooling all process of extracting data from large
5. Data Handling relevant data together. data sets.

Subject-oriented,
integrated, time-varying
and non-volatile AI, statistics, databases, and machine
constitute data learning systems are all used in data
6. Functionality warehouses. mining technologies.

Data warehousing is the


process of extracting and
storing data in order to
make reporting more Pattern recognition logic is used in
7. Task efficient. data mining to find patterns.

It extracts data and stores


it in an orderly format, This procedure employs pattern
making reporting easier recognition tools to aid in the
8. Uses and faster. identification of access patterns.
S. Basis of
No. Comparison Data Warehousing Data Mining

Data mining aids in the creation of


When a data warehouse suggestive patterns of key
is connected with parameters. Customer purchasing
operational business behavior, items, and sales are
systems like CRM examples. As a result, businesses will
(Customer Relationship be able to make the required
Management) systems, it adjustments to their operations and
9. Examples adds value. production.

Features of Data mining

These are the following key features that data mining usually allows us:

o Sift through all the chaotic and repetitive noise in your data.
o Allows understanding what is relevant and then making good use of that information to assess likely
outcomes.
o Accelerate the pace of making informed decisions.

Types of Data Mining

Each of the following data mining techniques serves several different business problems and provides a different insight
into each of them. However, understanding the type of business problem you need to solve will also help in knowing
which technique will be best to use, which will yield the best results. The Data Mining types can be divided into two
basic parts that are as follows:

1. Predictive Data Mining Analysis


2. Descriptive Data Mining Analysis

1. Predictive Data Mining

As the name signifies, Predictive Data-Mining analysis works on the data that may help to know what may happen later
(or in the future) in business. Predictive Data-Mining can also be further divided into four types that are listed below:

o Classification Analysis
o Regression Analysis
o Time Series Analysis
o Prediction Analysis

2. Descriptive Data Mining

The main goal of the Descriptive Data Mining tasks is to summarize or turn given data into relevant information. The
Descriptive Data-Mining Tasks can also be further divided into four types that are as follows:
o Clustering Analysis
o Summarization Analysis
o Association Rules Analysis
o Sequence Discovery Analysis

Difference between Database System and Data Warehouse


 Difficulty Level : Medium
 Last Updated : 09 Nov, 2022

Read

Discuss
Database System: Database System is used in traditional way of storing and retrieving data. The major task of database
system is to perform query processing. These systems are generally referred as online transaction processing system.
These systems are used day to day operations of any organization. Data Warehouse: Data Warehouse is the place where
huge amount of data is stored. It is meant for users or knowledge workers in the role of data analysis and decision making.
These systems are supposed to organize and present data in different format and different forms in order to serve the
need of the specific user for specific purpose. These systems are referred as online analytical processing. Difference
between Database System and Data Warehouse:

Database System Data Warehouse

It supports analysis and performance


It supports operational processes. reporting.

Capture and maintain the data. Explore the data.

Current data. Multiple years of history.

Data is balanced within the scope of Data must be integrated and balanced from
this one system. multiple system.

Data is updated when transaction


occurs. Data is updated on scheduled processes.

Data verification occurs when entry is


done. Data verification occurs after the fact.

100 MB to GB. 100 GB to TB.

ER based. Star/Snowflake.

Application oriented. Subject oriented.

Primitive and highly detailed. Summarized and consolidated.


Database System Data Warehouse

Flat relational. Multidimensional.

Similarity Between Database And Data Warehouse


Data is stored in both a database and a data warehouse. Storage systems are what you’re looking at. Typically, the bottom tier of a
data warehouse is a relational database. Relational database systems are also databases, with data stored in rows and columns in
relational database systems.

Both database and data warehouse let multiple users access the same data at the same time. Many users can simultaneously
access a single database or data warehouse. To get at the data, you’ll need to run queries in both the datawarehouse and the
database. Complex queries can be used to access the data warehouse, but simple queries can be used to access the OLTP
database. And finally, whether on-premises or in the cloud, a company’s data warehouse and database are available.

10+ Differences Between Database And Data Warehouse


Before we get into database vs data warehouse, we will explain to you about two processing methods: Online Transaction
Processing (OLTP) and Online Analytical Processing (OLAP). Here are the definitions of these two:

 Online transaction processing (OLTP): Online Transaction Processing or OLTP refers to a class of systems capable of supporting
transaction-oriented applications such as online banking, shopping, order entry or sending text messages.
 Online analytical processing (OLAP): As we got transaction data from OLTP, enterprises would then utilise Online Analytical
Processing to extract insights and make more informed decisions. Hence, Online Analytical Processing can be defined as a
computing method that allows users to easily generate and query data for analytical purposes.

How is data warehouse different from database? Is data warehouse a database? let’s take a look at this table:

Data Warehouse Vs Database Comparison Table

Database Data Warehouse

Database is an organized collection Data warehouse is a data


Definition of data that is saved, modified, and management system to
returned to users in accordance unify data from multiple
with the users’ specifications. sources for analysis and
reporting purposes.

Main purpose Record data Analyze data

Data warehouse ensures


When updating real-time data,
that a wide variety of data
Design database is most concerned with
can be accessed over time
accuracy.
for the purpose of analysis.

Processing method OLTP OLAP

Orientation Application oriented Subject oriented

A data warehouse’s primary


function is to collect and
Transactions are the primary focus
Focus analyze data from a variety
of database queries.
of sources and provide
reports.

Data warehouse sits atop


Database comes in all shapes and
other databases and are
sizes. OLTP, CSV, text files, Excel
Data Type used for analysis. Data
spreadsheets, and XML files. The
warehouse stores current
data is real-time and up-to-date.
and historical data.

Store data from multiple


Storage limit Store data from a single application
applications

Data designing
ER modeling techniques Data modeling
technique

Single-point-transaction (SPT) In order to handle broad


database is optimized for read- analytical queries, data
Optimization write operations. Nearly all of the warehouses are geared for
time, OLTP database queries are retrieving big data sets and
answered instantly. aggregating the data.

Database uses a less dynamic With a data warehouse, if


reporting style. In most cases, these you want to know how a
Reporting are one-time lists. It is possible to business is doing, you can’t
save these results in the form of just use a static report. The
PDFs. It’s possible to mix data from data is gathered and
multiple tables, and complicated summarized into various
queries may be required. reports for analysis.

Denormalized data. Table


and joins are simple. With a
Normalized data. Table and joins data warehouse, the data is
are complicated. Normalized data is structured in an OLAP
used in an OLTP database to database in such a way that
Data Duplication
improve processing and efficiency, analysis and reporting are
and there is no duplication of data made easier. There are
with a database. usually fewer tables in a
basic structure since the
data is denormalized.

Data in database is available in real- Data is refreshed from


Data availability
time source systems

Data is structured in
Data is structured in a flat relational dimensional and
Data schema approach method. Physical – denormalized approach
Logical schema. method. Star – Snowflake
schema.

Downtime is built-in to for


Uptime 99.99% uptime
uploading new data

Query Simple transaction queries Complex queries

Only one user can modify data at a Serveral users can modify
Concurrent users
time data at the same time

Data summary Detailed data Summarized data

Database stores customer


Data warehouse manages
information, account-related
Use case – Banking the resources available on
activities, payments, deposits,
the desk.
loans, credit cards…

Data warehouse is used for


airline system management
operations, such as crew
Database stores reservations and
Use case – Airlines assignment, analyzes of
schedule information.
route, frequent flyer
program discount schemes
for passenger…
Data warehouse is used for
Database stores call records,
Use case – making product
monthly bills, balance
Telecommunication promotions, sales decisions,
maintenance…
and distribution decisions.

Highly complex for end-


High budget due to hardware and users. Require training or
software cost. hire expert for setting up.

Can not perform complicated High maintenance.


Disadvantages queries. Extracting, loading, and
cleaning data is time-
May lose control over the consuming.
data.(privacy, ownership, security
issues). May have problems that
are undetected for years.

Difference between OLAP and OLTP in DBMS


Online Analytical Processing (OLAP): Online Analytical Processing consists of a type of software
tools that are used for data analysis for business decisions. OLAP provides an environment to get
insights from the database retrieved from multiple database systems at one time. Examples – Any
type of Data warehouse system is an OLAP system. The uses of OLAP are as follows:
 Spotify analyzed songs by users to come up with a personalized homepage of their songs and
playlist.
 Netflix movie recommendation system.

Online transaction processing (OLTP): Online transaction processing provides transaction-oriented


applications in a 3-tier architecture. OLTP administers the day-to-day transactions of an
organization.
Examples: Uses of OLTP are as follows:
 ATM center is an OLTP application.
 OLTP handles the ACID properties during data transactions via the application.
 It’s also used for Online banking, Online airline ticket booking, sending a text message, add a book
to the shopping cart.
Comparisons of OLAP vs OLTP :

Sr. OLAP (Online analytical OLTP (Online transaction


No. Category processing) processing)

It is well-known as an online
database query management It is well-known as an online
1. Definition system. database modifying system.

Consists of historical data from Consists of only of operational


2. Data source various Databases. current data.

It makes use of a standard


It makes use of a data database management system
3. Method used warehouse. (DBMS).

It is subject-oriented. Used for


Data Mining, Analytics, It is application-oriented. Used for
4. Application Decisions making, etc. business tasks.

In an OLAP database, tables are In an OLTP database, tables are


5. Normalized not normalized. normalized (3NF).

The data is used in planning,


problem-solving, and decision- The data is used to perform day-
6. Usage of data making. to-day fundamental operations.
Sr. OLAP (Online analytical OLTP (Online transaction
No. Category processing) processing)

It provides a multi-dimensional
view of different business It reveals a snapshot of present
7. Task tasks. business tasks.

It serves the purpose to extract It serves the purpose to Insert,


information for analysis and Update, and Delete information
8. Purpose decision-making. from the database.

The size of the data is relatively


A large amount of data is small as the historical data is
9. Volume of data stored typically in TB, PB archived. For ex MB, GB

Relatively slow as the amount


of data involved is large. Very Fast as the queries operate
10. Queries Queries may take hours. on 5% of the data.

The OLAP database is not The data integrity constraint must


often updated. As a result, data be maintained in an OLTP
11. Update integrity is unaffected. database.

Backup and
It only need backup from time Backup and recovery process is
12. Recovery to time as compared to OLTP. maintained rigorously

The processing of complex It is comparatively fast in


queries can take a lengthy processing because of simple and
13. Processing time time. straightforward queries.

This data is generally managed This data is managed by clerks,


14. Types of users by CEO, MD, GM. managers.

Only read and rarely write


15. Operations operation. Both read and write operations.

With lengthy, scheduled batch


operations, data is refreshed The user initiates data updates,
16. Updates on a regular basis. which are brief and quick.

Nature of
Process that is focused on the Process that is focused on the
17. audience customer. market.

Database
Design with a focus on the Design that is focused on the
18. Design subject. application.
Sr. OLAP (Online analytical OLTP (Online transaction
No. Category processing) processing)

Improves the efficiency of


19. Productivity business analysts. Enhances the user’s productivity.

KDD Process in Data Mining


Data Mining – Knowledge Discovery in Databases(KDD).
KDD (Knowledge Discovery in Databases) is a process that involves the extraction of useful,
previously unknown, and potentially valuable information from large datasets. The KDD
process in data mining typically involves the following steps:
1. Selection: Select a relevant subset of the data for analysis.
2. Pre-processing: Clean and transform the data to make it ready for analysis. This may
include tasks such as data normalization, missing value handling, and data integration.
3. Transformation: Transform the data into a format suitable for data mining, such as a
matrix or a graph.
4. Data Mining: Apply data mining techniques and algorithms to the data to extract useful
information and insights. This may include tasks such as clustering, classification,
association rule mining, and anomaly detection.
5. Interpretation: Interpret the results and extract knowledge from the data. This may
include tasks such as visualizing the results, evaluating the quality of the discovered
patterns and identifying relationships and associations among the data.
6. Evaluation: Evaluate the results to ensure that the extracted knowledge is useful,
accurate, and meaningful.
7. Deployment: Use the discovered knowledge to solve the business problem and make
decisions.
The KDD process is an iterative process and it requires multiple iterations of the above steps
to extract accurate knowledge from the data.

Steps Involved in KDD Process:


KDD process

1. Data Cleaning: Data cleaning is defined as removal of noisy and irrelevant data from collection.
 Cleaning in case of Missing values.
 Cleaning noisy data, where noise is a random or variance error.
 Cleaning with Data discrepancy detection and Data transformation tools.
2. Data Integration: Data integration is defined as heterogeneous data from multiple sources combined in a common
source(DataWarehouse).
 Data integration using Data Migration tools.
 Data integration using Data Synchronization tools.
 Data integration using ETL(Extract-Load-Transformation) process.
3. Data Selection: Data selection is defined as the process where data relevant to the analysis is decided and retrieved
from the data collection.
 Data selection using Neural network.
 Data selection using Decision Trees.
 Data selection using Naive bayes.
 Data selection using Clustering, Regression, etc.
4. Data Transformation: Data Transformation is defined as the process of transforming data into appropriate form
required by mining procedure.
Data Transformation is a two step process:
 Data Mapping: Assigning elements from source base to destination to capture transformations.
 Code generation: Creation of the actual transformation program.
5. Data Mining: Data mining is defined as clever techniques that are applied to extract patterns potentially useful.
 Transforms task relevant data into patterns.
 Decides purpose of model using classification or characterization.
6. Pattern Evaluation: Pattern Evaluation is defined as identifying strictly increasing patterns representing knowledge
based on given measures.
 Find interestingness score of each pattern.
 Uses summarization and Visualization to make data understandable by user.
7. Knowledge representation: Knowledge representation is defined as technique which utilizes visualization tools to
represent data mining results.
 Generate reports.
 Generate tables.
 Generate discriminant rules, classification rules, characterization rules, etc.
Note:

 KDD is an iterative process where evaluation measures can be enhanced, mining can be refined, new data can be
integrated and transformed in order to get different and more appropriate results.
 Preprocessing of databases consists of Data cleaning and Data Integration.

ADVANTAGES OR DISADVANTAGES:
Advantages of KDD:

1. Improves decision-making: KDD provides valuable insights and knowledge that can help organizations make better
decisions.
2. Increased efficiency: KDD automates repetitive and time-consuming tasks and makes the data ready for analysis, which
saves time and money.
3. Better customer service: KDD helps organizations gain a better understanding of their customers’ needs and
preferences, which can help them provide better customer service.
4. Fraud detection: KDD can be used to detect fraudulent activities by identifying patterns and anomalies in the data that
may indicate fraud.
5. Predictive modeling: KDD can be used to build predictive models that can forecast future trends and patterns.

Disadvantages of KDD:

1. Privacy concerns: KDD can raise privacy concerns as it involves collecting and analyzing large amounts of data, which
can include sensitive information about individuals.
2. Complexity: KDD can be a complex process that requires specialized skills and knowledge to implement and interpret
the results.
3. Unintended consequences: KDD can lead to unintended consequences, such as bias or discrimination, if the data or
models are not properly understood or used.
4. Data Quality: KDD process heavily depends on the quality of data, if data is not accurate or consistent, the results can
be misleading
5. High cost: KDD can be an expensive process, requiring significant investments in hardware, software, and personnel.
6. Overfitting: KDD process can lead to overfitting, which is a common problem in machine learning where a model learns
the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new
unseen data.

You might also like