You are on page 1of 2

GENERAL REVIEW

PREPARED BY:
AHMAD ASLAM
F17-BSIT-3031
BS (H) IT 6TH SEMESTER EVE
UNIVERSITY OF OKARA

Data mining:
Data mining is a process that uses various data analysis tools to find patterns as well as Relationships are data
that can be used to make valid predictions. Data mining assists business analysts with finding patterns and
relationships in data. It does not tell you the number of patterns in the organization. In addition, the patterns
obtained by data mining must be validated in the real world. The first and easiest step to analyze in data mining
is to explain the details. Summarize its mathematical attributes (such as means and standard deviations), analyze
it using charts and graphs, and look for possible links between variables (such as common values). As the
emphasis is on the part of the mining process, collecting, analyzing and selecting the right data becomes very
important
Data mining applications:
Data mining is becoming more and more popular because of the huge impact it can have. It can be used to
control costs and contribute to inflation. Many organizations use data mining to help manage all stages of their
customer life cycle, including finding new customers, increasing revenue for existing customers, and
maintaining good customers. Data mining offers value beyond the bulk of the industrial complex.
Communication and credit card companies are the two leaders in using data mining to detect fraudulent use of
their services.
Data sets are made up of data objects. A data object represents an entity.
Examples:
sales database: customers, store items, sales
medical database: patients, treatments
university database: students, professors, courses
Data objects are described by attributes
Attribute:
A data field, representing a characteristic or feature of a data object.
Data Preprocessing:
Includes Data cleaning, data integration, data transformation, data reduction.
Data Warehousing:
Provides structures and tools for business management organize, understand, and use their data to make
strategic decisions. Data warehousing systems are important tools in today's competitive, ever-changing world.
"Data storage is a focus-based, integrated, time-varying collection of data, and data collection that does not
support supporting the decision-making process of management". This short but comprehensive description
introduces major features of repositories. The four keywords - subject-oriented, integrated, time-varying, and
unchanging - distinguish archives from other data storage systems, such as related data systems, distribution
systems, and file systems.
GENERAL REVIEW
PREPARED BY:
AHMAD ASLAM
F17-BSIT-3031
BS (H) IT 6TH SEMESTER EVE
UNIVERSITY OF OKARA

Data cubes:
Configure the processing of online analysis of various data. Data cube data is an important function in the
initialization of data storage. Completion of all or part of a data cable can significantly reduce the reaction time
and improve the performance of online analysis. However, such comptipis are challenging because they may
require considerable computational time and storage space.

Data classification:
Data classification is a two-step process, which includes a learning step (where a classification model is built)
and a classification step (where a model is used to guess class labels with given data) a classifier is created
describing a standard set of data classes or concepts. This is a learning step (or training phase), where a
classification algorithm builds a classifier by analyzing or “learning from” a training set made of database
etuples and their related labels in the classroom. The tuple, X, is represented by the ve-attribute of the n -ature
attribute, XD .x1, x2, :: ,, xn /, which represents the n-approximation made to the tuplefrom data attribute n,
respectively, A1, A2 , ::, An .1 Each multiplication, X, is assumed to belong to a defined class as determined by
another data attribute called a class label attribute. The labeling feature of this class is priceless and unmatched.
It is distinguished (or designated) that each value acts as a category or category. Individual tuples constitute a
set of training referred to as training pumps and sampled randomly from the analyzed dataset. In terms of
classification, data topics can be
called samples, examples, events, data points, or objects.
In the second step the model is used for classification. First, the accuracy of the student's guess is measured. If
we were to use the training set to measure the accuracy of the student's study, this estimate might be optimistic,
since the findings are usually the same as the data (i.e., during learning may include some differences in the
training data for which there is no general data set). Therefore, using a test set, made up of test tuples and their
own related category labels. They are not independent of training powders, meaning they were not used to build
a classifier. A student's accuracy in a given test set is the percentage of the test set percentages correctly set by
the classifier.
Decision tree induction:
Is the study of decision trees from training lectures in the classroom. The decision tree is a tree structure such as
flowchart-, where each internal node (not a leaf node) implies a test by attribute, each branch represents a test
result, and each leaf node (or terminal node) has a class label. The best place on the tree is the root zone.
How are decision trees used to classification? ”Given the tuple, X, otherwise known as the corresponding class
label, tuple attribute values are checked against the decision tree. The path is followed from the roots to the leaf
spot, which holds the class prediction of that tuple. Decision trees can be easily converted into classification
rules.

You might also like