You are on page 1of 9

CLASSIFICATION OF HANDWRITTEN DIGITS USING LOGITIC REGRESSION AND NEURAL NETWORKS

Deepak Kinni Department of Computer Science University at Buffalo-SUNY Buffalo, NY 14260 deepakki@buffalo.edu

ABSTRACT
Classification of handwritten digits is solved under the context of Probabilistic Discriminative Models, more specifically Logistic Regression. The concepts of multiclass logistic regression are applied to successfully train a model to interpret handwritten digits to a considerable accuracy. Alternate models such as Neural Networks and Nave Bayes are applied to obtain a comparative study among the models. All models are further tested on the test data set to determine the underlying accuracies and in obtaining a comparative result between them.

DATASET (GSC)
The GSC (Gradient, Structural, and Concavity) feature set used approximate multi-resolution approach by being generated at three ranges: local, intermediate and global.
The gradient features are used to detect the local features of the image; they are helpful in understanding the stroke shape at a short distance. The structural features are built upon the gradient features applied to longer distances and provide information of the stroke trajectories. The concavity features are used to provide information of the stroke relationships in long distances which can be spread across the images.

LOGISTIC REGRESSION
represents the data set provided to us. is obtained by iteratively combining 0 , 1 , 2 .. 9 and appending a column matrix of ones. Hence, the dimensions of the matrix is Nx(D+1), where D=512 and N=20,000. The next step involves obtaining the target matrix. The target matrix consists of output values represented in 1-K coding scheme. Hence, for example for the values in 0 the output vector is (1 0 0 0 0 0 0 0 0 0). Hence the augmented target matrix is of dimensions NxK, where N=20,000 and K=10. The following step involves producing a weight matrix which is filled with zeros. The resulting matrix will have , Kx(D+1) dimension.

The equation =
is used obtain () in accordance to softmax transformation

= =

exp ( ) 10 exp ( ) =1

dimensions of which are NxK . We apply Newton-Raphson formula to obtain the Hessian and Gradient of error functions. This is given by the following equations: = and = where R is a diagonal matrix of dimension NxN given by = ( ). Finally the weights are calculated using the Newton-Raphson update () = () (). Using the weights , we test the data and note the number of misclassifications. Since the given data set is highly refined we expect the data to have minimum error when it comes to testing.

Neural Networks
This methodology will present a basic neural network model which can be fundamentally described as a series of functional transformations. The solution requires us to initially construct M linear combinations of input variables 1 , 2 . , in the following form. = (1) + (1)0 =1 Where j=1,,M , superscript indicating that the corresponding parameters are in the first layer of the network. (1) can be referred to as the weights and (1)0 as biases. Each of these is then further transformed into a differentiable, non-linear activation function h(.) to give = ( ) . These, correspond to the output of the basis functions, in this context called hidden units. The non-linear functions h(.) are chosen based upon the nature of the data and the assumed distribution of target variables. Then the values are linearly combined in accordance to give output unit activations. = (2) + (2) 0 =1 Where k=1,..,K, K=10 in our case corresponding to the total number of outputs.

For the purpose of demonstration, we shall use nprtool tool set to determine the error rates. nprtool is specifically made to implement a feed forward neural network for binary data and binary output. It trains the data using back propagation. It provides the accuracy in two forms: MLE and %Error. For this project, %Error (%E) has been taken as the measure of accuracy. As the function can perform testing after training, we combine the training data and the testing data. Therefore, we use 1979 training samples of each digit, along with 1500test data.

Hence, our training set would be of 21,290x512 and the target vector would be 21,290x10. The no of neurons for the input layer is taken equal to the number of features (M = 512) and no of neurons for the output layer is taken equal to the number of classes (K = 10). For Training we use 75% , 15% for validation and 10% for testing. Number of Hidden Neurons=20,25 and 30

NEURAL NETWORKS (EXPECTED RESULTS)

Number of Hidden Neurons

%E for Training

%E for Validation

%E for Testing

20
25 30

5.0100e-2
9.0100e-1 7.2207e0

7.019e0
2.316e0 10.4224e0

6.9958e0
2.8182e0 10.9915e0

Hence, the %E for 25 Hidden Neurons produces an accuracy of 97.2%. It is to be noted that the values obtained are result of retraining the data.

You might also like