Professional Documents
Culture Documents
DATA, Database,
Data Warehouse
and Data Mining
DATA
Data is a plural of ‘Datum’, which is originally a Latin noun meaning
“something given.”
● Vectors are a logical element in programming languages that are used for storing data.
● Vectors are similar to arrays but their actual implementation and operation differs.
● Each item in the vector has to be the same length and type.
DATA FRAME
A DataFrame is a data structure that organizes data
into a 2-dimensional table of rows and columns,
much like a spreadsheet.
Characteristics:
6. Indexes: Indexes are data structures that improve the speed of data
retrieval operations on tables. They are created on one or more columns
of a table to facilitate fast lookup of data based on those columns.
Indexes are especially useful for columns frequently used in queries and
joins.
Few Examples
■ Banking: Data mining is used to predict successful loan
applicants as well as to detect fraud in credit cards.
■ Retail: Create effective advertisements based on past
responses.
■ Insurance: Predict probability and costs for future
disasters, based on past hurricanes or tornadoes.
■ Grocery stores: Analyze market baskets to find products
usually bought together. Running a sales promotion on
one item can improve sales of the other item at its normal
price.
DATA MINING
1. Apriori Algorithm: The Apriori algorithm is one of the most commonly used algorithms for
association rule mining. It employs a breadth-first search strategy to discover frequent
itemsets by iteratively generating candidate itemsets and pruning those that do not meet a
minimum support threshold.
2. FP-Growth Algorithm: The FP-Growth (Frequent Pattern Growth) algorithm is an alternative
to the Apriori algorithm that uses a divide-and-conquer approach to efficiently discover
frequent itemsets without generating candidate itemsets. It constructs a compact data
structure called the FP-tree to represent the dataset and mines frequent patterns directly
from the tree structure.
3. Eclat algorithm : (Equivalence Class Clustering and Bottom-Up Lattice Traversal) is another
popular method for mining frequent itemsets in association rule mining. Similar to the Apriori
algorithm, Eclat is used to identify sets of items that frequently occur together in transactions.
However, it utilizes a different approach that focuses on exploiting vertical data format (also
known as Transaction ID List or tid-list) to achieve efficiency.
ASSOCIATION RULES