Supervised Learning

Supervised Learning
We talk about supervised learning; it is a technique used in the field of data mining. The goal of a
supervised learning technique is to create a model that predicts a value for a continuous outcome or
sorts a categorical outcome. We start by discussing about how to divide the data set so that we can
judge how well the model will do in the future. Then, we discuss about how to measure how well
classification and prediction methods work. We then talk about three supervised learning techniques
that are often used: k-nearest neighbors, classification and regression trees, and logistic regression. That
would discuss by the next reporter.
Partitioning Data
Traditional statistical methods are used to figure out how many sample data points are needed and then
use confidence intervals and hypothesis tests to draw conclusions about the whole population from the
sample data. Data mining applications, on the other hand, deal with a lot of data, which makes it easier
to judge how accurate data-based estimates of variable effects are.
But a lot of data can make it tempting to make a model too good. Overfitting happens when an analyst
builds a model that does a great job of explaining the sample of data it is based on but fails to make
accurate predictions outside the sample data.
That the reason there is a portioning of data
Data partitioning in data mining is the division of the whole data available into two or three non-
overlapping sets: the training set, the validation set, and the test set. If the data set is very large, often
only a portion of it is selected for the partitions. Partitioning is normally used when the model for the
data at hand is being chosen from a broad set of models. The basic idea of data partitioning is to keep a
subset of available data out of analysis, and to use it later for verification of the model.
You can train or build a model with the Training Set. For example, the training set is used to fit the linear
regression model in a linear regression (i.e., to compute the regression coefficients). The training set is
used to get the network weights in a neural network model. After fitting the model to the Training Set,
the Validation Set should be used to test how well the model works.
Once a model has been built with the help of the Training Set, it must be tested with new data to make
sure it works well. If the Training Set was used to figure out how well the model fits, the result would be
a too optimistic estimate of how well the model fits. This is because the training or model fitting process
makes sure that the model's accuracy for the training data is as good as it can be and that the model fits
the training data perfectly. To get a more accurate idea of how the model would work with data it has
never seen, we must set aside some of the original data and not use it when training the model. This
collection of data is called the Validation Set. XLMiner measures the difference between the actual
observed values and the predicted values of the observations to check how well the model works. This
difference is called the prediction error, and it is used to measure how accurate the model is as a whole.
Models are often fine-tuned with the help of the Validation Set. For example, you could try out neural
network models with different architectures and check how well each one works on the Validation Set
to pick the one that works best. When a model is chosen, how well it works with the Test Set is still only
a good estimate of how well it will work with data that hasn't been seen yet. This is because, out of all
the competing models, the final model won because it was the most accurate with the Validation Set.
So, it's a good idea to save another piece of data that can be used for either training or validation. This
group of things is called the Test Set. The accuracy of the model on the test data gives a good idea of
how well it will work on data it has never seen before.
Xlminer provides two methods of partitioning: Standard Partitioning and Partitioning with Oversampling.
There are two approaches to standard partitioning: random partitioning and user-defined partitioning.
When the number of successes in the output variable is very low, this method of partitioning is
used (i.e., callers who opt in to a short survey at the end of a customer service call). Most of the
time, very few people finish the survey, so there isn't much information about these callers.
Because of this, it would be hard to make a model based on these callers. When this happens, we
have to use Oversampling (also called weighted sampling). When there are only two classes and
one is much more important than the other, oversampling can be used (i.e., callers who finish the
survey as compared to callers who simply hang up).
The Classification method is a type of Supervised Learning that uses training data to put new
observations into the right category. In Classification, the algorithm learns a pattern from a set of
data or observations and then puts new observations into one of many classes. Classes can also
be called categories, targets, or labels.
In classification, the outcome variable is a category instead of a continuous value, like "Yes or
no," "0 or 1," and so on. Because the Classification method is a supervised learning method, it
needs data that has been labeled. This means that it has both inputs and outputs.
Classification predictive modeling is all about using examples from a problem area to figure out
what kind of class they belong to. The most common way to measure how well a classification
prediction model works is by its classification accuracy. Because a predictive model's accuracy is
often high (over 90%), the error rate is usually used to describe the model's performance.
To improve the accuracy of classification, you must first use a classification model to predict
what each sample in a test dataset will be. The predictions are then compared to the labels that
are already known for the examples in the test set. The accuracy is then found by dividing the
number of correct predictions made on the test set by the total number of predictions made on the
test set.
On the other hand, you can figure out the error rate by dividing the total number of wrong
predictions on the test set by the total number of predictions on the test set.
Since accuracy and error rate are complementary, they could always be computed one from the
other.
A confusion matrix is not a measure for evaluating a model, but it does give information about
the predictions. It is necessary to understand the confusion matrix in order to understand other
classification metrics such as accuracy and recall.
The confusion matrix goes beyond classification accuracy by displaying the accurate and wrong
(i.e. true or false) predictions for each class. A confusion matrix is a 2×2 matrix in the case of a
binary classification problem. If there are three separate classes, the matrix is 3×3, and so on.
True Negative (TN) is the proportion of valid forecasts that are negative
False Positive (FP) is the frequency of inaccurate guesses that occur in positive cases
False Negative (FN) is the number of inaccurate guesses that occur in bad situations
True Positive (TP) is the frequency of positive examples with correct forecasts
Precision is the proportion of positive events that are genuinely positive. It assesses “how helpful
the classifier’s results are.” Precision assesses how accurate our model is when the forecast is
correct.
Precision=True Positive/(True Positive + False Positive)

Otherwise, a precision of 90% means that when our classifier flags a customer as fraud, it is truly
fraud 90% of the time. Positive forecasts are the emphasis of precision. It shows how many
optimistic forecasts have come true.
Another approach to look at the TPs is via the lens of recall. Recall is the proportion of true
positive events that are marked as such. It assesses “how complete the outcomes are,” or what
percentage of real positives are projected to be positive.
Precision=True Positive/(True Positive + False Negative)

Actual positive classifications are the objective of recall. It reflects how many of the positive
classifications the model accurately predicts.
The ROC graph is another method for evaluating the classifier’s performance. The ROC graph is
a two-dimensional graphic that shows the false positive rate on the X axis and the true positive
rate on the Y axis.
In many situations, the classifier includes a parameter that may be modified to increase genuine
positive rates at the expense of raising false positive rates or to decrease false positive rates
depending on the declining value of actual positive rates. Each parameter setting provides a par
value for a false positive rate and a positive actual rate, and the number of such pairings may be
utilised to describe the ROC curves. The following are the characteristics of a ROC graph.
 The ROC curve or point is independent of the class distribution or the cost of mistakes.
 The ROC graph comprises all of the information included in the error matrix.
 The ROC curve is a visual tool for measuring the classifier’s ability to properly identify
positive and negative examples that were mistakenly categorised.
In many situations, the area under the one ROC curve may be employed as a measure of
accuracy, and it is known as measurement accuracy based on the surface.

Supervised Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Supervised Learning

Uploaded by

Copyright:

Available Formats

Supervised Learning

That the reason there is a portioning of data

Precision=True Positive/(True Positive + False Positive)

Precision=True Positive/(True Positive + False Negative)

You might also like