Professional Documents
Culture Documents
2.Area chart
An area chart is an adaptation of a line chart
3.Bar chart
A bar chart also illustrates changes over time.
4.Histogram
A histogram looks like a bar chart, but measures frequency rather than trends over
time.
5.Scatter plot
Scatter plots are used to find correlations.
6.Bubble chart
A bubble chart is an adaptation of a scatter plot
7.Pie chart
A pie chart is the best option for illustrating percentages
8.Gauge
A gauge can be used to illustrate the distance between intervals.
9.Map
Much of the data dealt with in businesses has a location element, which makes it easy
to illustrate on a map.
10.Heat map
A heat map is basically a color-coded matrix.
11.Frame diagram
Frame diagrams are basically tree maps which clearly show hierarchical relationship
structure.
Exp2
Data preprocessing is a data mining technique which is used to transform the raw data in
a useful and efficient format.
Exp3
A decision tree is a structure that includes a root node, branches, and leaf nodes. Each
internal node denotes a test on an attribute, each branch denotes the outcome of a test, and
each leaf node holds a class label. The topmost node in the tree is the root node.
Exp4
Classifiers:
• Bayes: It is a density estimation for numerical attributes.
• Meta: It is a multi-response linear regression.
• Functions: It is logistic regression.
• Lazy: It sets the blend entropy automatically.
• Rule: It is a rule learner.
• Trees: Trees classifies the data.
Exp6
Python is a popular programming language. It was created by Guido van Rossum,
and released in 1991.
It is used for:
● web development (server-side),
● software development,
● mathematics,
● system scripting.
Exp7
K-Nearest Neighbor(KNN) Algorithm:
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases and
put the new case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the
similarity.
This means when new data appears then it can be easily classified into a well suite category
by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data, then
it classifies that data into a category that is much similar to the new data.
Exp8
Classifiers create boundaries in instance space. Different classifiers have different biases. You
can explore them by visualizing the classification boundaries.
Exp9
What is Clustering?
Clustering is the process of making a group of abstract objects into classes of similar
objects.
Applications of Cluster Analysis
● Clustering analysis is broadly used in many applications such as market
research, pattern recognition, data analysis, and image processing.
● Clustering can also help marketers discover distinct groups in their customer
base. And they can characterize their customer groups based on the purchasing
patterns.
● In the field of biology, it can be used to derive plant and animal taxonomies,
categorize genes with similar functionalities and gain insight into structures
inherent to populations.
● Clustering also helps in identification of areas of similar land use in an earth
observation database. It also helps in the identification of groups of houses in
a city according to house type, value, and geographic location.
● Clustering also helps in classifying documents on the web for information
discovery.
● Clustering is also used in outlier detection applications such as detection of
credit card fraud.
● As a data mining function, cluster analysis serves as a tool to gain insight into
the distribution of data to observe characteristics of each cluster.
Requirements of Clustering in Data Mining
The following points throw light on why clustering is required in
data mining −
● Scalability − We need highly scalable clustering algorithms to deal with
large databases.
● Ability to deal with different kinds of attributes − Algorithms should
be capable to be applied on any kind of data such as interval-based
(numerical) data, categorical, and binary data.
● Discovery of clusters with attribute shape − The clustering algorithm
should be capable of detecting clusters of arbitrary shape. They should not
be bounded to only distance measures that tend to find spherical cluster
of small sizes.
● High dimensionality − The clustering algorithm should not only be able to
handle low-dimensional data but also the high dimensional space.
● Ability to deal with noisy data − Databases contain noisy, missing or
erroneous data. Some algorithms are sensitive to such data and may lead to poor
quality clusters.
● Interpretability − The clustering results should be interpretable,
comprehensible, and usable.
Analytical CRM Based on the intelligent mining of the customer data and using it tactically for
future strategies.