Data Mining QB

QUESTION BANK
UNIT I
Unit1: Introduction: Data mining – Functionalities – Classification – Introduction to Data Warehousing – Data
Preprocessing : Preprocessing the Data – Data cleaning – Data Integration and Transformation – Data Reduction
PART A –2 MARKS
1. List outprocess of KDD. CO-1 K-1
 Data cleaning
 Data integration
 Data selection
 Data transformation
 Data mining
 Pattern evaluation
 Knowledge presentation.
2. Define Data mining. CO-1 K-1
Data mining is the process of sorting through large data sets to identify patterns
and relationships that can help solve business problems through data analysis.
Data mining techniques and tools enable enterprises to predict future trends and
make more-informed business decisions.
3. Listouttheapplicationsofdata mining CO-1 K-2
 Data Mining Applications
 Financial Data Analysis.
 Retail Industry.
 Telecommunication Industry.
4. Differentiatedataminingtoolsandquerytools. CO-1 K-2
Query tools can be used to easily build and input queries to databases. ... On the
other hand, Data Mining is a technique or a concept in computer science, which
deals with extracting useful and previously unknown information from raw
data.
5. Whatismeantby machinelearning? CO-1 K-1
Machine learning (ML) is a type of artificial intelligence (AI) that allows
software applications to become more accurate at predicting outcomes without
being explicitly programmed to do so. Machine learning algorithms use historical
data as input to predict new output values.
6. Whatarethetechniquesusedindata mining?
There are numerous crucial data mining techniques to consider when entering
the data field, but some of the most prevalent methods include clustering, data
cleaning, association, data warehousing, machine learning, data visualization,
1 Data Mining
classification, neural networks, and prediction.
7. Defineclustering. CO-1 K-1
Clustering is the task of dividing the population or data points into a number of
groups such that data points in the same groups are more similar to other data
points in the same group than those in other groups. In simple words, the aim is
to segregate groups with similar traits and assign them into clusters.
8. Defineregression. CO-1 K-1
The term “Regression” refers to the process of determining the relationship
between one or more factors and the output variable. The outcome variable is
called the response variable, whereas the risk factors and co-founders are known
as predictors or independent variables.
9. Givethetypesofregression. CO-1 K-1
 Linear Regression. ...
 Logistic Regression. ...
 Polynomial Regression. ...
10. Whatisclassification? CO-1 K-1
The system of sorting living organisms into various groups based on their
characteristic similarities and differences is called classification.
11. Whatisanassociationrule? CO-1 K-1
Association rules take the form “If antecedent, then consequent,” along with a
measure of the support and confidence associated with the rule.
12. CO-1 K-1
Defineprediction
Predication is the process of identifying the missing or unavailable numerical
data for a new observation. In classification, the accuracy depends on finding the
class label correctly. In prediction, the accuracy depends on how well a given
predictor can guess the value of a predicated attribute for new data.
13. Definebinning. CO-1 K-1

Binning is a way to group a number of more or less continuous values into a
smaller number of "bins".
14. Whymachinelearningisdone? CO-1 K-1
Machine learning is important because it gives enterprises a view of trends in
customer behaviour and operational business patterns, as well as supports the
development of new products. Many of today's leading companies, such as Face
book, Google, and Uber, make machine learning a central part of their
operations.
15. Difference between supervisedlearningandunsupervisedlearning. CO-1 K-1
2 Data Mining
In supervised learning, input data is provided to the model along with the
output. In unsupervised learning, only input data is provided to the model.
16. Definedatacleaning. CO-1 K-1

Data cleaning is the process of fixing or removing incorrect, corrupted,
incorrectly formatted, duplicate, or incomplete data within a dataset. When
combining multiple data sources, there are many opportunities for data to be
duplicated or mislabelled.
17. Whatispatternevaluation? CO-1 K-1
Pattern Evaluation is defined as identifying strictly increasing patterns
representing knowledge based on given measures. Find interestingness score of
each pattern. Uses summarization and Visualization to make data understandable
by user.
18. Whatisdescriptiveandpredictivedatamining? CO-1 K-1
Descriptive mining is generally used to support correlation, cross-tabulation,
frequency, etc. The term 'Predictive' defines to predict something, so predictive
data mining is the analysis done to predict the future event or multiple data or
trends. It defines the features of the data in a target data set.
19. Whatarethegoalsoftimeseriesanalysis? CO-1 K-1
There are two main goals of time series analysis: identifying the nature of the
phenomenon represented by the sequence of observations, and forecasting
(predicting future values of the time series variable
20. Classifydataminingsystems. CO-1 K-1
Data mining system can also be classified based on the kind of (a) databases
mined, (b) knowledge mined, (c) techniques utilized, and (d) applications adapted
UNIT 2
Data Mining, Primitives, Languages and System Architecture: Data Mining – Primitives – Data Mining Query
Language, Architectures of Data mining Systems. Concept Description, Characterization and Comparison:
Concept Description, Data Generalization and summarization, Mining Class Comparison
PART A –2 MARKS
1. CO-2 K-1
Definedata warehouse.
A data warehouse is a type of data management system that is designed to enable
and support business intelligence (BI) activities, especially analytics. Data
warehouses are solely intended to perform queries and analysis and often contain
large amounts of historical data.
3 Data Mining
2. Whatistheneedofdatawarehouses? CO-2 K-4
The need for Data Warehouse is to generate reports, feed data to Business
Intelligence (BI) tools, forecast trends, and train Machine Learning models. Data
Warehouse stores data from multiple sources such as APIs, Databases, Cloud
Storage, etc., using the ETL (Extract Load Transform) process.
3. DefineOLAP. CO-2 K-1
OLAP (for online analytical processing) is software for performing
multidimensional analysis at high speeds on large volumes of data from a data
warehouse, data mart, or some other unified, centralized data store.
4. Definemultidimensionaldatamodel. CO-2 K-1
The multi-Dimensional Data Model is a method which is used for ordering data
in the database along with good arrangement and assembling of the contents in
the database.
5. Whatisadatacube? CO-2 K-2
A data cube refers is a three-dimensional (3D) (or higher) range of values
that are generally used to explain the time sequence of an image's data. It is a
data abstraction to evaluate aggregated data from a variety of viewpoints
6. Definedimensions. CO-2 K-2
Dimensions in mathematics are the measure of the size or distance of an object or
region or space in one direction. In simpler terms, it is the measurement of the
length, width, and height of anything. Dimensions are generally expressed as:
Length.
7. Whatarefacts? CO-2 K-1
There are three types of facts: Summative facts: Summative facts are used with
aggregation functions such as sum (), average (), etc. Semi summative facts:
There are small numbers of quasi-summative fact aggregation functions that will
apply. For example, consider bank account details.
8. DefineOLTP. CO-2 K-4

OLTP or online transactional processing is a software program or operating
system that supports transaction-oriented applications in a three-tier architecture.
It facilitates and supports the execution of a large number of real-time
transactions in a database
9. DefineOLAP. CO-2 K-1
OLAP (for online analytical processing) is software for performing
multidimensional analysis at high speeds on large volumes of data from a data
warehouse, data mart, or some other unified, centralized data store
4 Data Mining
10. Definedimensiontable. CO-2 K-1
A dimension table is a database table referencing defining pieces of information
or attributes for particular records in a primary database table. Experts may
describe a dimension table as part of a "database schema" or a database
conceptual map that shows the logical construction of the database.
11. Definefacttable. CO-2 K-2
A fact table stores quantitative information for analysis and is often
denormalized. A fact table works with dimension tables. A fact table holds the
data to be analyzed, and a dimension table stores data about the ways in which
the data in the fact table can be analyzed.
12. Whatarelatticeofcuboids? CO-2 K-1
Lattice structures have been developed which consists of data cubes
or cuboids. In the Lattice framework base cuboid contains all N - dimensions and
moving up to the hierarchy we reach to 0 - Dimensional cuboid called apex
cuboid. New cuboid may be generated by roll-up through dimension reduction
13. Whatisapexof cuboids? CO-2 K-4
The apex cuboid, or 0-D cuboid, refers to the case where the group-by is empty.
It contains the total sum of all sales. The base cuboid is the least generalized
(most specific) of the cuboids. The apex cuboid is the most generalized (least
specific) of the cuboids, and is often denoted as all.
14. Listout the variousOLAP Operations. CO-2 K-1
There are primary five types of analytical OLAP operations in data warehouse:
1) Roll-up 2) Drill-down 3) Slice 4) Dice and 5) Pivot.
15. Givethenamesofwarehouseschemas. CO-2 K-1
Following are the three major types of schemas:
1. Star Schema.
2. Snowflake Schema.
3.Galaxy Schema
16. Definestarschema. CO-2 K-2
A star schema is a database organizational structure optimized for use in a data
warehouse or business intelligence that uses a single large fact table to store
transactional or measured data, and one or more smaller dimensional tables that
store attributes about the data.
17. Definesnowflakeschema. CO-2 K-4
A snowflake schema is a logical arrangement of tables in a multidimensional
database such that the entity relationship diagram resembles a snowflake shape.
The snowflake schema is represented by centralized fact tables which are
connected to multiple dimensions.
5 Data Mining
18. Definemetadata. CO-2 K-1
A data mart is a simple form of data warehouse focused on a single subject or
line of business. With a data mart, teams can access data and gain insights faster,
because they don't have to spend time searching within a more complex data
warehouse or manually aggregating data from different source.
19. Definedatamart. CO-2 K-1
A data mart is a simple form of data warehouse focused on a single subject or
line of business. With a data mart, teams can access data and gain insights faster,
because they don't have to spend time searching within a more complex data
warehouse or manually aggregating data from different sources
20. Whataretheapplicationsofmetadata? CO-2 K-2
1. ID and port used by a peer to share files.
2. Name and size of the uploaded/downloaded files.
3. Flow encryption level.
4. Software version.
UNIT 3
Mining Association Rules: Basics Concepts – Single Dimensional Boolean Association Rules From Transaction
Databases, Multilevel Association Rules from transaction databases – Multi dimension Association Rules from
Relational Database and Data Warehouses.
PART A –2 MARKS
1. CO-3 K-2
DefineAssociationrulemining.
Association rule mining, at a basic level, involves the use of machine
learning models to analyze data for patterns, or co-occurrences, in a database. It
identifies frequent if-then associations, which themselves are the association
rules
2. CO-3 K-1
rules
3. Whatisclassificationofassociationrulemining? CO-3 K-2
Classification rule mining aims to discover a small set of rules in the database
that forms an accurate classifier. Association rule mining finds all the rules
existing in the database that satisfy some minimum support and minimum
confidence constraints.
4. WhatisthepurposeofApriorialgorithm? CO-3 K-2
Apriori is an algorithm for frequent item set mining and association rule
learning over relational databases. It proceeds by identifying the frequent
6 Data Mining
individual items in the database and extending them to larger and larger item sets
as long as those item sets appear sufficiently often in the database.
5. GivetwotechniquestoimproveApriorialgorithm. CO-3 K-2
Techniques to improve the efficiency of Apriori algorithm
 Hash based technique.

 Transaction Reduction
6. DifferentiateAprioriandFBgrowth. CO-3 K-2

Apriori uses candidate generation where frequent subsets are
extended one item at a time. FP-growth generates conditional FP-Tree for every
item in the data. Since Apriori scans the database in each of its steps it becomes
time-consuming for data where the number of items is larger
7. Whatissingledimensionalassociationrule? CO-3 K-1

If the items or attributes in an association rule reference only one dimension, then
it is a single-dimensional association rule. For example, the rule. computer
=>antivirus software [support = 2%, confidence = 60% could be written as.
buys(X, "computer”) = buys(X, “antivirus software")
8. Whatismultidimensionalassociationrule? CO-3 K-2

In Multi dimensional association rule Qualities can be absolute or quantitative.
Quantitative characteristics are numeric and consolidate order. Numeric traits
should be discretized. Multi dimensional affiliation rule comprises of more than
one measurement.
9. Whatishybriddimensionalassociation rule? CO-3 K-2
The Apriori technique finds the Hybrid dimension association rules mining
algorithm satisfies the definite condition on the basis of multidimensional
transaction database. Boolean Matrix based approach has been employed to
generate frequent item sets in multidimensional transaction databases.
10. Whatisstrongassociationrule? CO-3 K-2

An association rule having support and confidence greater than or equal to a user-
specified minimum support threshold and respectively a minimum confidence
threshold.
11. CO-3 K-2

7 Data Mining
rules
12. CO-3 K-1
rules
13. Whatisminimum support and minimum confidence CO-3 K-2
ofassociationrulemining?
Classification rule mining aims to discover a small set of rules in the database
that forms an accurate classifier. Association rule mining finds all the rules
existing in the database that satisfy some minimum support and minimum
confidence constraints.
14. WhatisthepurposeofApriorialgorithm? CO-3 K-2
Apriori is an algorithm for frequent item set mining and association rule
learning over relational databases. It proceeds by identifying the frequent
individual items in the database and extending them to larger and larger item sets
as long as those item sets appear sufficiently often in the database.
15. GivetwotechniquestoimproveApriorialgorithm. CO-3 K-2
Techniques to improve the efficiency of Apriori algorithm
 Hash based technique.

 Transaction Reduction
16. DifferentiateAprioriandFBgrowth. CO-3 K-2

Apriori uses candidate generation where frequent subsets are
extended one item at a time. FP-growth generates conditional FP-Tree for every
item in the data. Since Apriori scans the database in each of its steps it becomes
time-consuming for data where the number of items is larger
17. Whatissingledimensionalassociationrule? CO-3 K-2

If the items or attributes in an association rule reference only one dimension, then
it is a single-dimensional association rule. For example, the rule. computer
=>antivirus software [support = 2%, confidence = 60% could be written as.
buys(X, "computer”) = buys(X, “antivirus software")
18. Whatismultidimensionalassociationrule? CO-3 K-1

In Multi dimensional association rule Qualities can be absolute or quantitative.
8 Data Mining
Quantitative characteristics are numeric and consolidate order. Numeric traits
should be discretized. Multi dimensional affiliation rule comprises of more than
one measurement.
19. Whatishybriddimensionalassociation rule? CO-3 K-2
The Apriori technique finds the Hybrid dimension association rules mining
algorithm satisfies the definite condition on the basis of multidimensional
transaction database. Boolean Matrix based approach has been employed to
generate frequent item sets in multidimensional transaction databases.
20. Whatisstrongassociationrule? CO-3 K-2

An association rule having support and confidence greater than or equal to a user-
specified minimum support threshold and respectively a minimum confidence
threshold.
UNIT 4
Classification and Prediction: Introduction – Issues – Decision Tree Induction – Bayesian Classification.
Classification based on Concepts from Association Rule Mining – Other Methods. Prediction – Introduction –
Classifier Accuracy
PART A –2 MARKS
1. Whatarethestepsinvolvedinpreparingthedataforclassification? CO-4 K-2
There are 7 steps to effective data classification:
Complete a risk assessment of sensitive data. ...

Develop a formalized classification policy. ...
Categorize the types of data. ...
Discover the location of your data. ...
Identify and classify data. ...
Enable controls. ...
Monitor and maintain
2. Definetheconceptofclassification? CO-4 K-1
Classification is a data mining function that assigns items in a collection to target
categories or classes. The goal of classification is to accurately predict the target
class for each case in the data.
3. Whatisdecisiontree? CO-4 K-2
Decision tree is the most powerful and popular tool for classification and
prediction. A Decision tree is a flowchart-like tree structure, where each internal
node denotes a test on an attribute, each branch represents an outcome of the
9 Data Mining
test, and each leaf node (terminal node) holds a class label.
4. Whatisattributeelectionmeasure? CO-4 K-2
An attribute selection measure is a heuristic for choosing the splitting test

that “best” separates a given data partition, D, of class-labelled training tuples
into single classes.
5. Whatisattributeelectionmeasure? CO-4 K-2
An attribute selection measure is a heuristic for choosing the splitting test that
“best” separates a given data partition, D, of class-labelled training tuples into
single classes.
6. Defineprepruning. CO-4 K-2
Pre-pruning, also known as Early Stopping Rule, is the method where the
subtree construction is halted at a particular node after evaluation of some
measure. These measures can be the Gini Impurity or the Information Gain
7. Definepostpruning. CO-4 K-2
Post-pruning considers the subtrees of the full tree and uses a cross-
validated metric to score each of the subtrees. To clarify, we are using subtree to
mean a tree with the same root as the original tree but without some branches.
8. Whatismeantbypattern? CO-4 K-2

A pattern means that the data (visual or not) are correlated that they
have a relationship and that they are predictable. When you have a lack of
pattern, you have true randomness. When you find a pattern, you can have a good
idea when or where something will happen before it actually happens.
9. Whatare outliers? CO-4 K-2
An outlier is a mathematical value in a set of data which is quite

distinguishing from the other values. In simple terms, outliers are values
uncommonly far from the middle. Mostly, outliers have a significant impact on
mean, but not on the median, or mode. Thus, the outliers are crucial in their
influence on the mean
10 Data Mining
10. DefineCentriodofthecluster. CO-4 K-2
A centroid is the imaginary or real location representing the center of the
cluster. Every data point is allocated to each of the clusters through reducing the
in-cluster sum of squares.
11. Whatarethe hierarchicalmethodsusedinclassification. CO-4 K-2
There are two types of hierarchical clustering

 Agglomerative Hierarchical Clustering
 Divisive Clustering
12. What are Bayesian classifiers? CO-4 K-2
Naive Bayes classifiers are a collection of classification algorithms based

on Bayes' Theorem. It is not a single algorithm but a family of algorithms where
all of them share a common principle, i.e. every pair of features being classified
is independent of each other.
13. Writenotesonk-meansalgorithm. CO-4 K-2
The K-means algorithm identifies k number of Centriod, and then

allocates every data point to the nearest cluster, while keeping the Centriod as
small as possible. The 'means' in the K-means refers to averaging of the data;
that is, finding the Centriod.
14. Listoutthedensitybasedmethods. CO-4 K-2
 DBSCAN
 DENCLUE
15. Listoutthepartitioningmethods CO-4 K-2
 Range Partitioning.
 Hash Partitioning.
 List Partitioning.
 Composite Partitioning
16. Defineattribute-oriented induction. CO-4 K-2
11 Data Mining
Attribute-oriented induction summarizes the information in a relational
database by repeatedly replacing specific attribute values with more general
concepts according to user-defined concept hierarchies.
17. Write a note on: Bayes Theorem. CO-4 K-2
Bayes theorem is a mathematical formula, which is used to determine the

conditional probability of the given event. Conditional probability is defined as
the likelihood that an event will occur, based on the occurrence of a previous
outcome.
18. Define: Data Generalization. CO-4 K-2
Data Generalization is the process of summarizing data by replacing relatively

low-level values with higher level concepts. It is a form of descriptive data
mining.
19. Define: Cluster CO-4 K-2
A cluster is a collection of databases that is managed by a single instance of a

running database server. After initialization, a database cluster will contain a
database named postgres, which is meant as a default database for use by utilities,
users and third-party application.
20. What is an outlier? CO-4 K-2
Outlier is a data object that deviates significantly from the rest of the data
objects and behaves in a different manner. An outlier is an object that deviates
significantly from the rest of the objects. They can be caused by measurement or
execution errors.
UNIT 5
Cluster Analysis: Introduction – Types of Data in Cluster Analysis, Partitioning Methods – Hierarchical Methods
Density Based Methods – GRID Based Method – Model based Clustering Method.
PART A –2 MARKS
1. DefineCLARA. CO-5 K-2
CLARANS (Clustering Large Applications based on Randomized Search)
is a Data Mining algorithm designed to cluster spatial data.
2. DefineCLARANS, CO-5 K-4
CLARANS is a partitioning method of clustering particularly useful in

spatial data mining. We mean recognizing patterns and relationships existing in
spatial data (such as distance-related, direction-relation or topological data, e.g.
data plotted on a road map) by spatial data mining.
3. Differentiateagglomerativeanddivisivehierarchicalclustering? CO-5 K-2
12 Data Mining
Agglomerative: This is a "bottom-up" approach: each observation starts in
its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
Divisive: This is a "top-down" approach: all observations start in one cluster,

and splits are performed recursively as one moves down the hierarchy.
4. WhatisDBSCAN? CO-5 K-2

Density-Based Clustering AlgorithmsDensity-Based Spatial
Clustering of Applications with Noise (DBSCAN) is a base algorithm for
density-based clustering. It can discover clusters of different shapes and sizes
from a large amount of data, which is containing noise and outliers.
5. What is STING? CO-5 K-4
STING is a Grid-Based Clustering Technique. In STING, the dataset is

recursively divided in a hierarchical manner. After the dataset, each cell is
divided into a different number of cells. And after the cell, the statistical
measures of the cell are collected, which helps answer the query as quickly as
possible.
6. What is interval scaled variables. CO-5 K-2
The interval scale is a quantitative measurement scale where there is

order, the difference between the two variables is meaningful and equal, and the
presence of zero is arbitrary. It measures variables that exist along a common
scale at equal intervals.
7. What is CURE? CO-5 K-2
CURE (Clustering Using Representatives) is an efficient data

clustering algorithm for large databases. Compared with K-means clustering it
is more robust to outliers and able to identify clusters having non-spherical
shapes and size variances.
8. What is clustering? CO-5 K-4
Clustering is the task of dividing the population or data points into a

number of groups such that data points in the same groups are more similar to
other data points in the same group than those in other groups. In simple words,
the aim is to segregate groups with similar traits and assign them into clusters
13 Data Mining
9. What is prediction? CO-5 K-2
Predictive data mining is data mining that is done for the purpose
of using business intelligence or other data to forecast or predict trends. This type
of data mining can help business leaders make better decisions and can add value
to the efforts of the analytics team.
10. What hierarchical clustering? CO-5 K-2
Hierarchical clustering is set of methods that recursively cluster two items

at a time. There are basically two different types of algorithms, agglomerative
and partitioning. In partitioning algorithms, the entire set of items starts in a
cluster which is partitioned into two more homogeneous clusters.
11. What is model based method? CO-5 K-4
In the model-based method, all the clusters are hypothesized in order to

find the data which is best suited for the model. The clustering of the density
function is used to locate the clusters for a given model
12. Define web mining. CO-5 K-2
Web Mining is the process of Data Mining techniques to automatically discover

and extract information from Web documents and services. The main purpose of
web mining is discovering useful information from the World-Wide Web and its
usage patterns.
13. What is a multimedia database? CO-5 K-2
The multimedia databases are used to store multimedia data such as

images, animation, audio, video along with text. This data is stored in the form of
multiple file types like
14. Define web content mining. CO-5 K-4
Content mining is the browsing and mining of text, images, and graphs of
a Web page to decide the relevance of the content to the search query. This
browsing is done after the clustering of web pages through structure mining and
supports the results depending upon the method of relevance to the suggested
query.
15. Define web structure mining. CO-5 K-2
Web structure mining, one of three categories of web mining for data, is a tool
used to identify the relationship between Web pages linked by information or
direct link connection. It offers information about how different pages are linked
together to form this huge web.
16. Define web usage mining. CO-5 K-4
14 Data Mining
Web usage mining, a subset of Data Mining, is basically the extraction of
various types of interesting data that is readily available and accessible in the
ocean of huge web pages, Internet- or formally known as World Wide Web
(WWW).
17. What is spatial mining? CO-5 K-2
Spatial data mining is the application of data mining to spatial models. In spatial
data mining, analysts use geographical or spatial information to produce business
intelligence or other results. This requires specific techniques and resources to
get the geographical data into relevant and useful formats.
18. What is time series analysis? CO-5 K-2
Time series analysis is a specific way of analyzing a sequence of data

points collected over an interval of time. In time series analysis, analysts record
data points at consistent intervals over a set period of time rather than just
recording the data points intermittently or randomly.
19. Define sequence mining. CO-5 K-4
The mining sequence covers all aspects of mining, including: prospecting for ore
bodies, analysis of the profit potential of a proposed mine, extraction of the
desired materials and, once a mine is closed, the restoration of all lands used for
mining to their original state.
20. Define graph mining. CO-5 K-4
Graph Mining is the set of tools and techniques used to (a) analyze the properties
of real-world graphs, (b) predict how the structure and properties of a given
graph might affect some application, and (c) develop models that can generate
realistic graphs that match the patterns found in real-world graphs of interest.
15 Data Mining

Data Mining QB

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining QB

Uploaded by

Copyright:

Available Formats

QUESTION BANK

13. Definebinning. CO-1 K-1

16. Definedatacleaning. CO-1 K-1

8. DefineOLTP. CO-2 K-4

 Hash based technique.

6. DifferentiateAprioriandFBgrowth. CO-3 K-2

7. Whatissingledimensionalassociationrule? CO-3 K-1

8. Whatismultidimensionalassociationrule? CO-3 K-2

10. Whatisstrongassociationrule? CO-3 K-2

11. CO-3 K-2

 Hash based technique.

16. DifferentiateAprioriandFBgrowth. CO-3 K-2

17. Whatissingledimensionalassociationrule? CO-3 K-2

18. Whatismultidimensionalassociationrule? CO-3 K-1

20. Whatisstrongassociationrule? CO-3 K-2

Complete a risk assessment of sensitive data. ...

4. Whatisattributeelectionmeasure? CO-4 K-2

An attribute selection measure is a heuristic for choosing the splitting test

5. Whatisattributeelectionmeasure? CO-4 K-2

6. Defineprepruning. CO-4 K-2

7. Definepostpruning. CO-4 K-2

8. Whatismeantbypattern? CO-4 K-2

9. Whatare outliers? CO-4 K-2

An outlier is a mathematical value in a set of data which is quite

11. Whatarethe hierarchicalmethodsusedinclassification. CO-4 K-2

There are two types of hierarchical clustering

12. What are Bayesian classifiers? CO-4 K-2

Naive Bayes classifiers are a collection of classification algorithms based

13. Writenotesonk-meansalgorithm. CO-4 K-2

The K-means algorithm identifies k number of Centriod, and then

14. Listoutthedensitybasedmethods. CO-4 K-2

17. Write a note on: Bayes Theorem. CO-4 K-2

Bayes theorem is a mathematical formula, which is used to determine the

Data Generalization is the process of summarizing data by replacing relatively

A cluster is a collection of databases that is managed by a single instance of a

2. DefineCLARANS, CO-5 K-4

CLARANS is a partitioning method of clustering particularly useful in

3. Differentiateagglomerativeanddivisivehierarchicalclustering? CO-5 K-2

Divisive: This is a "top-down" approach: all observations start in one cluster,

4. WhatisDBSCAN? CO-5 K-2

5. What is STING? CO-5 K-4

STING is a Grid-Based Clustering Technique. In STING, the dataset is

6. What is interval scaled variables. CO-5 K-2

The interval scale is a quantitative measurement scale where there is

7. What is CURE? CO-5 K-2

CURE (Clustering Using Representatives) is an efficient data

8. What is clustering? CO-5 K-4

Clustering is the task of dividing the population or data points into a

10. What hierarchical clustering? CO-5 K-2

Hierarchical clustering is set of methods that recursively cluster two items

11. What is model based method? CO-5 K-4

In the model-based method, all the clusters are hypothesized in order to

Web Mining is the process of Data Mining techniques to automatically discover

The multimedia databases are used to store multimedia data such as

17. What is spatial mining? CO-5 K-2

18. What is time series analysis? CO-5 K-2

Time series analysis is a specific way of analyzing a sequence of data

19. Define sequence mining. CO-5 K-4

20. Define graph mining. CO-5 K-4

You might also like