You are on page 1of 6

Exp1

Data visualization is the graphical representation of


information and data. By using visual elements like charts, graphs,
and maps, data visualization tools provide an accessible way to see
and understand trends, outliers, and patterns in data.

Types of data visualization charts


1.Line chart
A line chart illustrates changes over time.

2.Area chart
An area chart is an adaptation of a line chart

3.Bar chart
A bar chart also illustrates changes over time.

4.Histogram
A histogram looks like a bar chart, but measures frequency rather than trends over
time.

5.Scatter plot
Scatter plots are used to find correlations.

6.Bubble chart
A bubble chart is an adaptation of a scatter plot

7.Pie chart
A pie chart is the best option for illustrating percentages

8.Gauge
A gauge can be used to illustrate the distance between intervals.

9.Map
Much of the data dealt with in businesses has a location element, which makes it easy
to illustrate on a map.

10.Heat map
A heat map is basically a color-coded matrix.

11.Frame diagram
Frame diagrams are basically tree maps which clearly show hierarchical relationship
structure.
Exp2
Data preprocessing is a data mining technique which is used to transform the raw data in
a useful and efficient format.

Steps Involved in Data Preprocessing


1. Data Cleaning:
The data can have many irrelevant and missing parts. To handle this part, data cleaning is
done. It involves handling of missing data, noisy data etc.
• (a). Missing Data:
This situation arises when some data is missing in the data. It can be handled in various ways.
Some of them are:

1. Ignore the tuples:


This approach is suitable only when the dataset we have is quite large and multiple values are
missing within a tuple.
2. Fill the Missing values:
There are various ways to do this task. You can choose to fill the missing values manually, by
attribute mean or the most probable value.
(b). Noisy Data:
Noisy data is a meaningless data that can’t be interpreted by machines.It can be generated due
to faulty data
collection, data entry errors etc. It can be handled in following ways :
1. Binning Method:
This method works on sorted data in order to smooth it. The whole data is divided into
segments of equal
size and then various methods are performed to complete the task. Each segmented is handled
separately.
One can replace all data in a segment by its mean or boundary values can be used to complete
the task.
2. Regression:
Here data can be made smooth by fitting it to a regression function.The regression used may
be linear
(having one independent variable) or multiple (having multiple independent variables).
3. Clustering:
This approach groups the similar data in a cluster. The outliers may be undetected or it will
fall outside
the clusters.
2. Data Transformation:
This step is taken in order to transform the data in appropriate forms suitable for mining
process. This
involves following ways:
1. Normalization:
It is done in order to scale the data values in a specified range (-1.0 to 1.0 or 0.0 to 1.0)
2. Attribute Selection:
In this strategy, new attributes are constructed from the given set of attributes to help the
mining process.
3. Discretization:
This is done to replace the raw values of numeric attribute by interval levels or conceptual
levels.
4. Concept Hierarchy Generation:
Here attributes are converted from level to higher level in hierarchy. For Example-The
attribute “city” can be converted to “country”.
3. Data Reduction:
Since data mining is a technique that is used to handle huge amount of data. While working
with huge volume of data, analysis became harder in such cases. In order to get rid of this, we
uses data reduction technique. It aims to increase the storage efficiency and reduce data
storage and analysis costs.
The various steps to data reduction are:
1. Data Cube Aggregation:
Aggregation operation is applied to data for the construction of the data cube.
2. Attribute Subset Selection:
The highly relevant attributes should be used, rest all can be discarded.
3. Numerosity Reduction:
This enable to store the model of data instead of whole data, for example: Regression Models.
4. Dimensionality Reduction:
This reduce the size of data by encoding mechanisms.It can be lossy or lossless. If after
reconstruction from compressed data, original data can be retrieved, such reduction are called
lossless reduction else it is called lossy reduction. The two effective methods of
dimensionality reduction are:Wavelet transforms and PCA (Principal Componenet Analysis).

Exp3
A decision tree is a structure that includes a root node, branches, and leaf nodes. Each
internal node denotes a test on an attribute, each branch denotes the outcome of a test, and
each leaf node holds a class label. The topmost node in the tree is the root node.

The benefits of having a decision tree are as follows −


• It does not require any domain knowledge.
• It is easy to comprehend.
• The learning and classification steps of a decision tree are simple and fast.

Exp4
Classifiers:
• Bayes: It is a density estimation for numerical attributes.
• Meta: It is a multi-response linear regression.
• Functions: It is logistic regression.
• Lazy: It sets the blend entropy automatically.
• Rule: It is a rule learner.
• Trees: Trees classifies the data.

Exp6
Python is a popular programming language. It was created by Guido van Rossum,
and released in 1991.
It is used for:
● web development (server-side),
● software development,
● mathematics,
● system scripting.

XAMPP is an abbreviation where X stands for Cross-Platform, A stands


for Apache, M stands for MYSQL, and the Ps stand for PHP and
Perl, respectively. It is an open-source package of web solutions that includes
Apache distribution for many servers and command-line executables along with
modules such as Apache server, MariaDB, PHP, and Perl.

Exp7
K-Nearest Neighbor(KNN) Algorithm:
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases and
put the new case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the
similarity.
This means when new data appears then it can be easily classified into a well suite category
by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data, then
it classifies that data into a category that is much similar to the new data.

Exp8
Classifiers create boundaries in instance space. Different classifiers have different biases. You
can explore them by visualizing the classification boundaries.

Exp9
What is Clustering?
Clustering is the process of making a group of abstract objects into classes of similar
objects.
Applications of Cluster Analysis
● Clustering analysis is broadly used in many applications such as market
research, pattern recognition, data analysis, and image processing.
● Clustering can also help marketers discover distinct groups in their customer
base. And they can characterize their customer groups based on the purchasing
patterns.
● In the field of biology, it can be used to derive plant and animal taxonomies,
categorize genes with similar functionalities and gain insight into structures
inherent to populations.
● Clustering also helps in identification of areas of similar land use in an earth
observation database. It also helps in the identification of groups of houses in
a city according to house type, value, and geographic location.
● Clustering also helps in classifying documents on the web for information
discovery.
● Clustering is also used in outlier detection applications such as detection of
credit card fraud.
● As a data mining function, cluster analysis serves as a tool to gain insight into
the distribution of data to observe characteristics of each cluster.
Requirements of Clustering in Data Mining
The following points throw light on why clustering is required in
data mining −
● Scalability − We need highly scalable clustering algorithms to deal with
large databases.
● Ability to deal with different kinds of attributes − Algorithms should
be capable to be applied on any kind of data such as interval-based
(numerical) data, categorical, and binary data.
● Discovery of clusters with attribute shape − The clustering algorithm
should be capable of detecting clusters of arbitrary shape. They should not
be bounded to only distance measures that tend to find spherical cluster
of small sizes.
● High dimensionality − The clustering algorithm should not only be able to
handle low-dimensional data but also the high dimensional space.
● Ability to deal with noisy data − Databases contain noisy, missing or
erroneous data. Some algorithms are sensitive to such data and may lead to poor
quality clusters.
● Interpretability − The clustering results should be interpretable,
comprehensible, and usable.

CRM: Customer Relationship Management


Types of CRM:
Strategic CRM Customer-centric, based on acquiring and maintaining profitable customers.

Operational Based on customer-oriented processes such as selling, marketing, and


CRM customer service.

Analytical CRM Based on the intelligent mining of the customer data and using it tactically for
future strategies.

Collaborative Based on application of technology across organization boundaries with a view


CRM to optimize the organization and customers.

You might also like