You are on page 1of 6

COURSE PROFICIENCY

Data Mining & Pattern Warehousing: 230602

Submitted to - Dr. Vikram Rajpoot

Submitted by –Ayushi Jain (0901io211015)


CO1: DESCRIBE BASICS OF DATA MINING INCLUDING DATA TYPES,
ADVANCED DATABASES, AND FUNCTIONALITIES

 In data mining, we work with various types of data, including structured (like tables in
databases), semi-structured (like XML files), and unstructured (like text documents or
images).

 Advanced databases used in data mining include relational databases, where data is
organized in tables with rows and columns; NoSQL databases, which are more flexible
and scalable for handling big data; and data warehouses, which store large volumes of
historical data for analysis.

 Data mining involves several key functionalities: Clustering , Classification , Association


Rule Mining , Regression Analysis:
CO2:CHOOSE APPROPRIATE DATA PRE-PROCESSING TECHNIQUES FOR
SPECIFIC REQUIREMENTS

 Data Cleaning: Removing or correcting errors in the data, such as missing values or inconsistent formatting, to ensure accuracy.

 Normalization: Scaling numerical features to a standard range, like between 0 and 1, to avoid biases due to different units or

scales.

 Data Transformation: Converting data into a suitable format for analysis, like encoding categorical variables into numerical

values.

 Feature Selection: Choosing relevant features that contribute most to the prediction task, reducing complexity and improving

model performance.

 Dimensionality Reduction: Reducing the number of features while retaining essential information, which helps in faster

processing and avoids overfitting.

 Data Discretization: Grouping continuous values into intervals or categories, simplifying analysis and interpretation.
CO3:COMPARE VARIOUS ASSOCIATION RULE MINING ALGORITHMS FOR
PRACTICAL APPLICATIONS

 Apriori Algorithm: It's a popular algorithm that finds frequent itemsets by iteratively generating
candidate itemsets and pruning those that do not meet minimum support.

 FP-Growth (Frequent Pattern Growth) Algorithm: This algorithm constructs a frequent pattern
tree to mine frequent itemsets more efficiently than Apriori by avoiding candidate generation.

 Eclat Algorithm: Eclat stands for "Equivalence Class Clustering and bottom-up Lattice Traversal."
It's similar to Apriori but uses a depth-first search approach to mine frequent itemsets.

 FP-Tree Growth Algorithm: This is an improved version of the FP-Growth algorithm that uses a
compressed representation of the transaction database to mine frequent itemsets faster.
CO4:EXPLAIN DIFFERENT METHODS FOR CLASSIFICATION, PREDICTION,
AND CLUSTER ANALYSIS

 Classification Methods:

 Decision Trees: These use a tree-like model of decisions based on features to classify data
into categories.
 Support Vector Machines (SVM): SVM finds the best separation line (or hyperplane) to
classify data into different classes.
 k-Nearest Neighbors (k-NN): It classifies data based on the majority class among its k
nearest neighbors.
Prediction Methods:
1.Linear Regression: It predicts a continuous value based on the relationship between independent and
dependent variables.
2.Logistic Regression: Similar to linear regression, but predicts the probability of a categorical outcome.
3.Random Forest: An ensemble method that uses multiple decision trees to make predictions.

Cluster Analysis Methods:


4.K-Means Clustering: Divides data into k clusters based on similarity.
5.Hierarchical Clustering: Creates a tree of clusters by recursively merging or splitting clusters.
6.DBSCAN (Density-Based Spatial Clustering of Applications with Noise): It groups together points that are
closely packed, ignoring regions of low density.

You might also like