You are on page 1of 4

1. What is data characterization?

Data characterization is a summarization of the general


characteristics or features of a target class of data. The data
corresponding to the user-specified class are typically collected by a
query. For example, to study the characteristics of software products
with sales that increased by 10% in the previous year, the data related
to such products can be collected by executing an SQL query on the
sales database.

2 Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving Classification
problems. It is a tree-structured classifier, where internal nodes represent the features
of a dataset, branches represent the decision rules and each leaf node represents the
outcome.

3 Association rule mining finds interesting associations and relationships


among large sets of data items. This rule shows how frequently a itemset
occurs in a transaction. A typical example is a Market Based Analysis.
Market Based Analysis is one of the key techniques used by large relations to
show associations between items.It allows retailers to identify relationships
between the items that people buy together frequently.

4 IF-THEN Rules
Rule-based classifier makes use of a set of IF-THEN rules for classification. We can
express a rule in the following from −
IF condition THEN conclusion
Let us consider a rule R1,
R1: IF age = youth AND student = yes
THEN buy_computer = yes
Points to remember −
 The IF part of the rule is called rule antecedent or precondition.
 The THEN part of the rule is called rule consequent.
 The antecedent part the condition consist of one or more attribute tests and these
tests are logically ANDed.
 The consequent part consists of class prediction.

5 A support vector machine (SVM) is a type of deep learning algorithm that


performs supervised learning for classification or regression of data groups.

In AI and machine learning, supervised learning systems provide both input


and desired output data, which are labeled for classification. The classification
provides a learning basis for future data processing. Support vector machines
are used to sort two data groups by like classification. The algorithms draw
lines (hyperplanes) to separate the groups according to patterns.

6 Classification is to identify the category or the class label of a new observation. First, a set of
data is used as training data. The set of input data and the corresponding outputs are given to the
algorithm. So, the training data set includes the input data and their associated class labels. Using
the training dataset, the algorithm derives a model or the classifier. The derived model can be a
decision tree, mathematical formula, or a neural network. In classification, when unlabeled data
is given to the model, it should find the class to which it belongs. The new data provided to the
model is the test data set.

The data classification life cycle produces an excellent structure for controlling the flow of
data to an enterprise. Businesses need to account for data security and compliance at
each level. With the help of data classification, we can perform it at every stage, from
origin to deletion. The data life-cycle has the following stages, such as:
1. Origin: It produces sensitive data in various formats, with emails, Excel, Word,
Google documents, social media, and websites.
2. Role-based practice: Role-based security restrictions apply to all delicate data by
tagging based on in-house protection policies and agreement rules.
3. Storage: Here, we have the obtained data, including access controls and
encryption.
4. Sharing: Data is continually distributed among agents, consumers, and co-workers
from various devices and platforms.
5. Archive: Here, data is eventually archived within an industry's storage systems.
6. Publication: Through the publication of data, it can reach customers. They can
then view and download in the form of dashboards.
7. Another process of data analysis is prediction. It is used to find a numerical output.
Same as in classification, the training dataset contains the inputs and
corresponding numerical output values. The algorithm derives the model or a
predictor according to the training dataset. The model should find a numerical
output when the new data is given. Unlike in classification, this method does not
have a class label. The model predicts a continuous-valued function or ordered
value.
8. Regression is generally used for prediction. Predicting the value of a house
depending on the facts such as the number of rooms, the total area, etc., is an
example for prediction.
9. For example, suppose the marketing manager needs to predict how much a
particular customer will spend at his company during a sale. We are bothered to
forecast a numerical value in this case. Therefore, an example of numeric prediction
is the data processing activity. In this case, a model or a predictor will be developed
that forecasts a continuous or ordered value function.

You might also like