Data Warhouse

• what are the 5 component of data
warhouse ?
Ans.
1. Source Systems: These are the systems that provide data to the data warehouse.
They can be transactional systems, operational databases, or other data sources.
2. Data Integration Processes: Data integration processes involve transforming data
from source systems into a format that can be loaded into the data warehouse.
These processes can include data cleansing, data integration, and data
transformation.
3. Data Warehouse Database: The data warehouse database is where the integrated
data is stored. It is designed to support complex queries and data mining and
optimized for query and analysis.
4. Metadata: Metadata is data about the data in the data warehouse. It provides
information about the structure of the data warehouse, the meaning of the data, and
the relationships between the data.
5. Business Intelligence Tools: Business intelligence tools are used to access, analyze,
and report on the data in the data warehouse. These tools can include reporting
tools, data visualization tools, and analytics tools, which enable users to explore the
data, gain insights, and make informed decisions.
• why correlation analysis is useful in data mining
Ans.
Correlation analysis is a statistical technique used to measure the relationship

between two or more variables. In data mining, correlation analysis is useful for
several reasons:
1. Identify patterns: Correlation analysis can help to identify patterns and relationships
between variables in the data set. This can help to identify trends and patterns that
may not be immediately apparent.
2. Variable selection: Correlation analysis can help to identify which variables are most
strongly related to the outcome variable. This can help to reduce the number of
variables that need to be included in the analysis, making the analysis more efficient
and effective.
3. Prediction: Correlation analysis can be used to predict the outcome variable based on
the values of other variables. This can be useful for predicting future trends and
identifying potential problems.
4. Data exploration: Correlation analysis can help to explore the data set and identify
potential outliers or unusual data points that may need further investigation.
5. Data visualization: Correlation analysis can be used to create data visualizations, such
as scatter plots, that can help to communicate the relationships between variables in
the data set.
Overall, correlation analysis is a useful tool in data mining because it can help to
identify patterns, select variables, predict outcomes, explore data, and create data
visualizations.
• How to find frequency patterns in data mining? Explain with example
Ans.
Frequency pattern mining is a data mining technique used to

identify frequent patterns, associations, and correlations among items in a
data set. These patterns can be used to understand customer behavior,
recommend products, and optimize business processes. Here is an example
of how frequency pattern mining can be used:
Suppose we have a data set that contains information about customer

transactions at a grocery store. Each transaction contains a list of items
purchased by the customer. We want to identify frequent itemsets, which
are groups of items that are frequently purchased together.
To do this, we can use an algorithm such as Apriori or FP-Growth, which are

commonly used for frequency pattern mining. These algorithms work by
generating candidate itemsets and then pruning those that do not meet
the minimum support threshold. The support threshold is the minimum
number of transactions in which an itemset must appear to be considered
frequent.
For example, if the support threshold is set to 3, an itemset containing

three items must appear in at least three transactions to be considered
frequent. If the support threshold is set too low, we may end up with too
many frequent itemsets, which can be difficult to interpret. If the support
threshold is set too high, we may miss important patterns.
Once we have identified the frequent itemsets, we can use association rule
mining to identify rules that describe the relationships between items. For
example, we may find that customers who purchase milk and bread are also
likely to purchase eggs. This information can be used to optimize product
placement in the store or to create targeted marketing campaigns.
Overall, frequency pattern mining is a useful technique in data mining

because it allows us to identify patterns and associations in large data sets,
which can be used to make informed business decisions.
•
What is Association in data mining with example
Ans.
Association in data mining is a technique used to discover

patterns, relationships, and associations between variables in a large data
set. It is often used in market basket analysis, where the goal is to find
associations between items that are frequently purchased together. Here is
an example of how association mining can be used:
Suppose we have a data set that contains information about customer

transactions at a grocery store. Each transaction contains a list of items
purchased by the customer. We want to identify associations between
items that are frequently purchased together.
To do this, we can use an algorithm such as Apriori or FP-Growth, which are

commonly used for association mining. These algorithms work by
generating frequent itemsets, which are groups of items that appear
together in a certain percentage of transactions. For example, we may find
that 30% of customers who purchase bread also purchase milk.
Once we have identified the frequent itemsets, we can use association rule
mining to identify rules that describe the relationships between items.
Association rules have the form of "If item A is purchased, then item B is
also likely to be purchased". For example, we may find a rule that states "If
bread is purchased, then milk is likely to be purchased". This information
can be used to optimize product placement in the store, create targeted
marketing campaigns, or bundle products together to increase sales.
Association mining can also be used in other domains such as healthcare,
fraud detection, and web mining. For example, in healthcare, association
mining can be used to identify associations between symptoms and
diseases or between drugs and adverse reactions. In fraud detection,
association mining can be used to identify groups of individuals or
transactions that are associated with fraudulent activity. In web mining,
association mining can be used to identify pages that are frequently
accessed together, which can be useful for improving search engine results.
Overall, association mining is a powerful technique in data mining that

allows us to identify patterns and relationships in large data sets, which can
be used to make informed decisions and improve business processes.
• Explain Apriori algorithm with

example
Ans.
The Apriori algorithm is a popular algorithm for frequent itemset
mining and association rule learning in data mining. It is used to identify frequent
itemsets, which are groups of items that frequently appear together in a dataset.
Here's an explanation of how the Apriori algorithm works, along with an example:
Suppose we have a dataset of customer transactions in a grocery store. Each

transaction consists of a list of items purchased by a customer. Our goal is to identify
frequent itemsets, which are sets of items that are frequently purchased together.
The Apriori algorithm works in two steps:
1. Generate candidate itemsets: The algorithm starts by generating all possible itemsets
of size 1, i.e., individual items that appear in the dataset. It then generates candidate
itemsets of size k by combining frequent itemsets of size k-1. For example, if the
frequent itemsets of size 2 are {milk, bread}, {milk, eggs}, and {bread, eggs}, then the
candidate itemsets of size 3 are {milk, bread, eggs}.
2. Prune infrequent itemsets: The algorithm prunes candidate itemsets that do not
meet a minimum support threshold, which is the minimum number of transactions
that an itemset must appear in to be considered frequent. For example, if the
minimum support threshold is set to 2, then the itemset {milk, bread, eggs} will be
pruned because it appears in only one transaction.
The Apriori algorithm repeats these steps until no more frequent itemsets can be
found.
Here's an example of how the Apriori algorithm can be applied to the grocery store
dataset:
Suppose we have the following transactions:
Transaction 1: milk, bread, eggs Transaction 2: milk, bread, butter Transaction 3: milk,
bread, eggs, butter Transaction 4: bread, eggs
We want to identify frequent itemsets with a minimum support of 2. Here's how the
Apriori algorithm would work:
1. Generate candidate itemsets of size 1: {milk}, {bread}, {eggs}, {butter}

2. Calculate the support of each itemset: {milk} (3), {bread} (4), {eggs} (3), {butter} (2)
3. Prune itemsets that do not meet the minimum support threshold: {milk}, {bread},
{eggs}
4. Generate candidate itemsets of size 2: {milk, bread}, {milk, eggs}, {bread, eggs},
{bread, butter}
5. Calculate the support of each itemset: {milk, bread} (3), {milk, eggs} (2), {bread, eggs}
(2), {bread, butter} (1)
6. Prune itemsets that do not meet the minimum support threshold: {milk, bread}, {milk,
eggs}, {bread, eggs}
7. Generate candidate itemsets of size 3: {milk, bread, eggs}
8. Calculate the support of the itemset: {milk, bread, eggs} (2)
9. Prune the itemset that does not meet the minimum support threshold: {milk, bread,
eggs}
The frequent itemsets that meet the minimum support threshold are {milk}, {bread},
{eggs}, {milk, bread}, {milk, eggs}, {bread, eggs}, and {milk, bread, eggs}. These
frequent itemsets can be used to identify association rules and make business
decisions, such as product placement and marketing campaigns.
Overall, the Apriori algorithm is an efficient and effective way to identify frequent

Data Warhouse

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Warhouse

Uploaded by

Copyright:

Available Formats

• what are the 5 component of data

• why correlation analysis is useful in data mining

Correlation analysis is a statistical technique used to measure the relationship

• How to find frequency patterns in data mining? Explain with example

Frequency pattern mining is a data mining technique used to

Suppose we have a data set that contains information about customer

To do this, we can use an algorithm such as Apriori or FP-Growth, which are

For example, if the support threshold is set to 3, an itemset containing

Overall, frequency pattern mining is a useful technique in data mining

Association in data mining is a technique used to discover

Suppose we have a data set that contains information about customer

To do this, we can use an algorithm such as Apriori or FP-Growth, which are

Overall, association mining is a powerful technique in data mining that

• Explain Apriori algorithm with

Suppose we have a dataset of customer transactions in a grocery store. Each

The Apriori algorithm works in two steps:

Suppose we have the following transactions:

1. Generate candidate itemsets of size 1: {milk}, {bread}, {eggs}, {butter}

You might also like