Professional Documents
Culture Documents
Introduction
The MNIST database (Modified National Institute of Standards and Technology database) is a
large database of handwritten digits that is commonly used for training various image processing
systems. The database is also widely used for training and testing in the field of machine learning.
The MNIST database contains 60,000 training images and 10,000 testing images. We have used
k-means, confusion matrix, and neural network to play with the data. In this project we applied
different machine learning (ML) techniques to MNIST data to build a classifier or detect certain
patterns.
Method
We have used the following models:
K-means clustering:
K-means clustering is one of the simplest and popular unsupervised machine learning algorithms.
Typically, unsupervised algorithms make inferences from datasets using only input vectors
without referring to known, or labelled, outcomes. You’ll define a target number k, which refers to
the number of centroids you need in the dataset. A centroid is the imaginary or real location
representing the centre of the cluster. Every data point is allocated to each of the clusters
through reducing the in-cluster sum of squares.
In other words, the K-means algorithm identifies k number of centroids, and then allocates every
data point to the nearest cluster, while keeping the centroids as small as possible. The ‘means’ in
the K-means refers to averaging of the data; that is, finding the centroid. To process the learning
data, the K-means algorithm in data mining starts with a first group of randomly selected
centroids, which are used as the beginning points for every cluster, and then performs iterative
(repetitive) calculations to optimize the positions of the centroids
Confusion Matrix:
In the field of machine learning and specifically the problem of statistical classification, a
confusion matrix, also known as an error matrix. A confusion matrix is a table that is often used to
describe the performance of a classification model (or “classifier”) on a set of test data for which
the true values are known. It allows the visualization of the performance of an algorithm. It allows
easy identification of confusion between classes e.g. one class is commonly mislabelled as the
other. Most performance measures are computed from the confusion matrix.
Neural Networks:
Figure shows the results and accuracy of k-means. In my opinion the results are good and
accurate.
Case 2: Using confusion matrix
Conclusion
We have used k-means, transfusion matrix and Neural networks on our MNIST data set and
found the results quite good and accurate. But across all these Neural networks results are very
good and accurate. Among all Neural Networks results have the highest accuracy.