ok

© All Rights Reserved

6 views

ok

© All Rights Reserved

- Our Digital Doubles
- Summary of Deep Learning Concepts (Mar 6 2018)
- Sequence to Sequence Learning With Neural Networks
- Deep MIML Network
- An Adaptive Neural Network Learning Based Solution for the Inverse Kinematics of Humanoid Fingers
- allaboutailuminarylabsjanuary122017-170112151616
- Wind Speed Forecasting
- Program
- a
- A Comparative Study Of Backpropagation Algorithms In Financial Prediction
- Artificial Neural Network
- telemetry reg 2010 pages 42 to 45
- Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human E.pdf
- Eyoh Phd Proposal Seminar 1 Prof Okeke
- Genetic Algorithm Optimizing Neural Network for Short-Term Load Forecasting
- Telecardiology and Teletreatment System Design for Heart Failures Using Type-2 Fuzzy Clustering Neural Networks
- Computations 2
- ANN & Fuzzy in Water
- 10.1007@978-3-319-95921-42.pdf
- application of anfis

You are on page 1of 15

00179

Vol. 6, No. 1 (2006) 4559

c Imperial College Press

DIGIT RECOGNITION

School of Electrical, Computer and Telecommunications Engineering

University of Wollongong, Northfields Avenue

Wollongong, NSW 2522, Australia

tivive@uow.edu.au

a.bouzerdoum@ieee.org

Received 6 September 2005

Revised 16 February 2006

In this paper, we apply a new neural network model, namely shunting inhibitory convolutional neural networks, or SICoNNets for short, to the problem of handwritten digit

recognition. This type of networks has a generic and exible architecture, where the

processing is based on the physiologically plausible mechanism of shunting inhibition.

A hybrid rst-order training method, called QRProp, is developed based on the three

training algorithms Rprop, Quickprop, and SuperSAB. The MNIST database is used to

train and evaluate the performance of SICoNNets in handwritten digit recognition. A

network with 24 feature maps and 2722 free parameters achieves a recognition accuracy

of 97.3%.

Keywords: Convolutional neural networks; shunting inhibitory neurons; handwritten

digit recognition; systematic connection schemes.

1. Introduction

Evolving from our understanding of neuro-biological systems, articial neural networks give computers an amazing capacity to learn complex tasks from examples.

They have become an alternative computational approach for problems that do

not have algorithmic solutions, or for which the algorithmic solutions are too difcult to express analytically. Their success can be attributed in part to their fault

tolerance, parallel processing, and generalization ability. The most popular neural

network architecture that is in use today, and discussed in almost every neural

network textbook, is the multilayer perceptron (MLP). MLPs have proven to be

a powerful computational tool for many problems in pattern recognition, function

approximation, and data analysis, to name a few. However, MLPs have some drawbacks when applied directly, without any processing, to high-dimensional data such

as in image analysis, image understanding and machine vision. The main problem

is that the size of the network grows with the size of the input image, which makes

the network training a much harder task. Moreover, over-tting may occur and

45

46

00179

the generalization ability of the network suers when there is no sucient training

samples. The common approach to circumvent these problems is to use some preprocessing techniques to extract lower-dimensional features from the input data.

Feature extraction, however, is a computationally expensive process and requires

prior knowledge about the data to design the feature extractor.

In the past 20 years, researchers have focused not only on the development of

training algorithms for MLPs, but also on the identication of signicant network

structures and weight constraints that can reduce the number of trainable parameters. Inspired by the Hubel and Weisels hierarchical vision model of the cortex,

Fukushima et al.1 developed neocognitron, a two-dimensional (2D) neural network

architecture for visual pattern recognition. LeCun et al.,2 on the other hand, proposed a series of convolutional neural network (CoNN) architectures, based upon the

three structural concepts of local receptive elds, weight sharing and sub-sampling.

These networks can easily deal with variability in 2D shapes and possess a certain

degree of local invariance to distortion and translation. Consequently, they have

attracted considerable interest and gained popularity for solving visual pattern

recognition problems such as face detection,3 face recognition4 and facial expression analysis,5 and medical image pattern recognition.6

In Ref. 7, LeCun et al. reported their latest CoNN which is widely known as

the LeNet-5 for handwritten digit recognition. The network consists of seven processing layers, where the rst four layers are two successive pairs of convolutional

and sub-sampling layers with a total of 44 feature maps for feature extraction. The

fth and the sixth layers are the respective convolutional and fully-connected layers

with 120 and 84 neurons, and the output layer has ten neurons to represent the

ten digit classes. Overall, the network has 60,000 trainable parameters, and was

on

trained and tested on the MNIST database8 with an error rate of 0.8%. Calder

et al.9 developed a CoNN structure similar to the LeNet-5 which uses Gabor lters as receptive elds for the rst convolutional layer, and at the output layer, 84

perceptrons is used to represent the output as a grayscale image of size 12 7.

To improve the performance of their handwritten digit recognition system, they

applied a boosting method to boost their networks so as to achieve an error rate of

0.68%. Simard and his colleagues,10 on the other hand, used a much simple CoNN

structure for handwritten digit recognition with four processing layers and a network retina of size 29 29. The rst layer has ve feature maps of size 13 13,

and the second layer has 50 feature maps. In each layer, the size of the feature

map is reduced from n to (n 3)/2, where n is the original size, and the receptive

elds size used throughout the network is 5 5. The last two layers are equivalent to two layer fully-connected MLP with 100 hidden neurons and ten neurons as

outputs. Expanding the training set through elastic distortions and using a crossentropy function as an error function, they have achieved an error rate of 0.4%.

Gorgevik and Cakmakov11 proposed another approach by combining two neural

networks and a support vector machine to implement a three-stage classier for

handwritten digit recognition. First, the digit images are preprocessed for slant

00179

47

correction. Then 292 features are extracted from the image as inputs for the cascade of classiers. Based on the MNIST database, their three-stage classier has an

error rate of 0.83%. The experimental results of these neural-based approaches show

that neural networks can yield state-of-the-art performances. Nevertheless, these

networks are still plugged with the problem of huge number of trainable parameters.

Recently, we have proposed a new class of convolutional neural networks, known

as shunting inhibitory convolutional neural networks (SICoNNets), which can be

easily tailored to the users specications.12 The key characteristics of these networks are the processing element used for feature extraction and the systematic

interconnection schemes between the dierent hidden layers. The processing elements in the hidden layers are based on the shunting inhibition mechanism, which

plays an important role in visual information processing in the cortex.13 15 The

reason for using this type of processing elements is that shunting inhibitory neurons have been shown to be more computationally powerful than the traditional

sigmoid type neurons. Contrary to a sigmoid neuron, a single shunting inhibitory

neuron can solve linearly nonseparable classication problems by forming nonlinear

decision boundaries.16,17 In Ref. 18, the shunting inhibitory convolutional neural

network was applied to a two-class pattern classication task for discriminating segmented images between a face and a non-face, and subsequently developed as a face

detection system that can detect and localize faces in complex background scenes.

In this paper, we apply SICoNNets to handwritten digit recognition. The next

section gives a detailed description of the shunting inhibitory convolutional neural

network architecture. Section 3 describes the training algorithms that have been

developed for these networks, followed by the description of the handwritten digit

recognition system in Sec. 4. The experimental results and performance analysis

are presented in Sec. 5, and nal concluding remarks are given in Sec. 6.

2. Description of SICoNNet Architecture

The proposed convolutional neural networks, SICoNNets, have a exible do-ityourself network architecture in which the following network parameters can be

specied: the input size, the receptive eld size, number of layers and/or number of

feature maps, number of outputs, and connection scheme between layers. The input

layer is a 2D array used by the network to receive images from the environment.

The input layer is succeeded by several processing layers, or hidden layers, and

each hidden layer is made up of planes of shunting inhibitory neurons, known as

feature maps. Each neuron in the feature map receives inputs from a small local

neighborhood in the previous layer, its receptive field. However, all the neurons in a

feature map share the same set of connection weights [Fig. 1(a)], and each hidden

layer has a xed receptive eld size. Since all neurons in a feature map share the

same set of weights, the same operation is performed on dierent parts of the

input plane. Hence, the same elementary visual feature is extracted from dierent

positions in the input image. Other feature maps of the same layer operate with

48

00179

Input image (or previous feature map)

Feature map

Receptive field

Shifted horizontally by

two positions

ReceptiveField

Shifted

vertically by

two positions

(a)

(b)

Fig. 1. Schematic diagrams illustrate: (a) the application of local receptive elds and (b) the

movement of a receptive eld in the input image.

dierent sets of weights to extract dierent types of local features. In higher layers,

the feature maps extract higher-order features by taking their inputs from one or

more feature maps in the preceding layer. In each hidden layer, another structural

process, namely sub-sampling, is performed to reduce the spatial resolution of the

2D input by shifting the centers of receptive elds of adjacent neurons by two

positions in both directions [see Fig. 1(b)]; as a result, the size of the feature maps

is reduced by one quarter in each hidden layer. This introduces a certain degree

of invariance to translation and input distortion as the absolute location of the

extracted feature becomes less important in higher layer so long as its approximate

position relative to other features is preserved.

The computation performed by the shunting inhibitory neuron at location (i, j)

in the kth feature map of the Lth layer is given by

ZL,k (i, j) =

XL,k (i, j)

,

aL,k (i, j) + YL,k (i, j)

where

XL,k (i, j) = gL

SL1

i, j = 1, . . . , FL

(1)

m=1

and

YL,k (i, j) = fL

SL1

[DL,k ZL1,m ](2i)(2j) + dL,k (i, j) .

m=1

The parameters CL,k and DL,k are the set of excitatory and inhibitory weights,

respectively, bL,k and dL,k are scalar parameters called the biases, aL,k is the passive

00179

49

decay rate of the neuron, gL and fL are the activation functions, SL1 is the number

of feature maps at the (L 1)th layer, and FL is the size of the feature map at the

Lth layer. In a feature map, all the neurons share the same set of weights, CL,k and

DL,k as well as the biases and the passive decay rate parameter. In order to avoid

division by zero in (1), aL,k is constrained to be positive:

aL,k (i, j) + YL,k (i, j) ,

(2)

In contrast to some existing CoNNs3,4,19 in which the connection strategy is

non-trivial and manually chosen, the proposed CoNNs were developed with three

systematic connection schemes: full-, binary-, and toeplitz-connection. In the fullconnection scheme, each hidden layer contains an arbitrary number of feature maps,

which are fully connected to the feature maps in the succeeding layer. This scheme

is similar to the MLP, where the number of hidden layers and hidden neurons

(equivalent to feature maps) can be changed arbitrarily. The binary-connection and

toeplitz-connection are partial-connection schemes where the rst hidden layer can

have an arbitrary number of feature maps so long as the subsequent layer has twice

the number of feature maps. In the binary connection, each feature map branches

out to two feature maps in the succeeding layer, as shown in Fig. 2(a), whereas in

the toeplitz-connection each feature map may have one-to-one or one-to-many links

with feature maps of the preceding layer. As an example, Table 1 illustrates the

connections between rst (L1) and second (L2) hidden layers. Suppose that L1 has

four feature maps, labeled AD, and L2 has eight feature maps, labeled 1 to 8 (rst

column). Feature maps 1 and 8 have one-to-one connections with feature maps A

and D, respectively. Feature map 2 makes connections with feature maps A and B.

Feature map 3 is connected to feature maps AC. The rest of the connections form

a Toeplitz matrix, hence the name [see Fig. 2(b)]. In other words, each feature map

of L1 connects to the same number of feature maps of L2 (in this case ve), and

its connections appear along a diagonal of the connection matrix. There are two

advantages for partial-connection schemes:

rst, it reduces the number of connections within the network, which may increase

the generalization ability;

second, it diversies the extraction of high-order features by taking inputs from

dierent set of feature maps rather than from all feature maps in the previous

layer.

At the output layer, sigmoid neurons are used as processing elements to classify

the features extracted at the last hidden layer. To reduce the number of weights,

a local averaging operation is applied on all feature maps in the last hidden layer;

that is a 2 2 non-overlapping receptive eld is used across each feature map to

average four outputs into a single signal which is fed to the neurons at the output

50

00179

Layer 2

Layer 2

1

Layer 1

Layer 1

A

3

(a)

Fig. 2.

(b)

Table 1.

L2 Feature Map

1

2

3

4

5

6

7

8

Connections from L1 to L2

A

B

C

D

A

B

C

D

A

B

C

D

S

N

y=h

wi zi + b ,

A

B

C

D

A

B

C

D

(3)

i=1

where y is the neural response of the sigmoid neuron, h is the output activation

function, wi s are the connection weights, zi s are the input signals, SN is the

number of input signals, and b is the bias term.

00179

51

3. Training Algorithm

To train the SICoNNets, a batch training algorithm based on the combination of

Rprop,20 Quickprop,21 and SuperSAB22 has been developed and named QRProp.

It is a local adaptation technique, in which the temporal behavior of the partial

derivative of the weight is used in the computation of the weight update. For comparison, the LevenbergMarquardt algorithm (LM) is also implemented, where the

Jacobian matrix is computed using a modied error-backpropagation rule similar

to the one developed by Hagan23 (see Ref. 12 for more details).

The weight update rule of the QRprop method is given by

(k + 1) = W

(k) + W

(k) +

(k 1),

W

(k) W

(4)

(k) is the weight vector which is obtained by reshaping all the weights

parameter W

in the receptive elds and biases of the neurons in the feature maps, where elements are taken column-wise from the rst hidden layer to the last layer of the

(k) is computed

network forming a large column vector. The weight update W

using the same principle as the Rprop algorithm, i.e., each local weight wi (k) in

(k) has its own step size, i (k), which is adjusted according

the weight vector W

to the observation of the behavior of the local gradient gi (k) during two successive

iterations

(5)

i (k) = max(0.5i (k 1), min), if gi (k)gi (k 1) < 0, i = 1, . . . , n

otherwise

i (k 1),

where n is the number of trainable weights, max and min are the upper and lower

limits of the step size, respectively; the initial value i (0) is set to 0.001 and the

respective limits for max and min are 10 and 1010 . The local weight update of

the ith weight is then determined by

wi (k) = sgn(gi (k))i (k),

(6)

where sgn denotes the signum function. When the current local gradient has a

change of sign with respect to the previous local gradient of the same weight, the

stored local gradient is set to zero so as to avoid an update in that weight in the

next iteration. Furthermore, when the product of the current and previous local

gradients is less than zero and there is an increase in the network error E, the ith

weight update is reverted back to the previous weight update and multiplied by an

adaptive momentum rate:

if gi (k)gi (k 1) < 0 and E(k) > E(k 1),

then wi (k) = i (k)wi (k 1).

(7)

52

00179

The adaptive momentum rate i (k) of the ith weight used in (4) and (7) is computed

as the magnitude of the Quickprop-step, bounded within the range [0.5, 1.5]:

gi (k)

,

(8)

i (k) =

gi (k 1) gi (k)

i (k) = min(

i (k), 1.5),

max(

i (k), 0.5), if gi (k)gi (k 1) < 0

i (k) =

.

0,

if gi (k)gi (k 1) = 0

(9)

(10)

Moreover, when there is a decrease in the current network error with respect to

the previous error, a small percentage of the negative gradient is added to the

weight

(k + 1) = W

(k + 1)

W

(k) g (k),

(11)

where

(k) is a vector of learning rates, which are adapted using similar principle

as the SuperSAB method and bounded above by (13).

(12)

otherwise

i (k 1),

i (k), 0.9).

i (k) = min(

(13)

Input: Initialize i 0.001, i 0.01, i 0.1. Calculate the local gradient.

1: while stopping criterion is not met do

2:

Calculate the adaptive momentum rate

ei (k), according to (8) and bound it above by (9).

3:

if gi (k)gi (k 1) > 0 then

4:

i (k) min(1.2i (k 1), max ),

5:

6:

else if gi (k)gi (k 1) < 0 then

7:

i (k) max(0.5i (k 1), min ),

8:

9:

i (k) max(e

i (k), 0.5),

10:

gi (k) 0.

11:

else if gi (k)gi (k 1) = 0 then

12:

i (k) 0,

13:

i (k) i (k 1),

14:

ei (k) i (k 1).

15:

end if

16:

i (k) min(e

i (k), 0.9).

17:

wi (k) sgn(gi (k))i (k).

18:

if gi (k)gi (k 1) < 0 and E(k) > E(k 1) then

19:

wi (k) i (k)wi (k 1).

20:

end if

21:

wi (k + 1) wi (k) + wi (k) + i (k)wi (k 1).

22:

if E(k) < E(k 1) then

23:

wi (k + 1) wi (k + 1) i (k)gi (k).

24:

end if

25: end while

00179

53

The SICoNNet used for digit recognition is a three layer network which has eight

feature maps in the rst hidden layer and 16 feature maps in the second hidden

layer. At the output layer, there are 10 sigmoid neurons, one for each digit. The

receptive eld size used throughout the network is 5 5 pixels, and the input layer

is 24 24 pixels. The reason for using this input size is to ensure that the feature

maps in the rst and second hidden layers have even size after the sub-sampling

operation. The activation functions, gL and fL , chosen for the rst hidden layer are

the hyperbolic tangent, f (x) = (ex ex )/(ex + ex ), and exponential function,

f (x) = ex , respectively; whereas in the second layer, gL is the logarithmic sigmoid

function, f (x) = 1/(1 + ex ). At the output layer, the activation function, h,

applied to the sigmoid neurons is the hyperbolic tangent function. The desired

outputs are ten-element column vectors whose elements are set to 1, except the

element corresponding to the input digit, which is set to 1. Overall, the network has

2722 free parameters that need to be adapted during the training process. Before

the training commences, the network parameters are initialized with random values

using a uniform distribution. The weights of the receptive eld are initialized in the

range [1/w, 1/w], where w is the width of the receptive eld. The bias parameters

are initialized between 1 and 1, whereas the passive decay rate term is initialized

in the range (0, 1], subjected to the condition in (2).

To train the SICoNNet for digit recognition, some sample digit patterns are

needed for training and testing the network; we used the MNIST database for training and testing. This database contains real-world samples of handwritten digits;

it is publicly available for evaluating machine learning and pattern recognition systems on handwritten digit recognition. It contains two disjoint sets of handwritten

digit patterns of size 28 28 pixels: one set is used for training and contains 60,000

samples, and the other set has 10,000 samples for testing. As the input size of the

network is 24 24 pixels, all the patterns in the database are resized using a nearest

neighbor interpolation technique. At the output layer, each neuron gives an output, and the neuron with the maximum response is considered the winning neuron,

which determines the class of the input pattern.

5. Experimental Results

In this section, we present the experimental results. First, two preliminary experiments are conducted to analyze the training algorithms and determine the most

suitable connection scheme for the network. Then, the chosen network structure,

with the selected connection scheme, is trained and evaluated on the MNIST

database,8 where the digit patterns have been converted into binary images.

5.1. Analysis of the training methods

To analyze the QRProp training method and the LM algorithm, a small network

with four feature maps in the rst hidden layer and eight feature maps in the second

54

00179

hidden layer was trained on a set of 5000 handwritten digit patterns, where 500

samples were taken from each digit class of the MNIST training set, based on a

ve-fold cross validation procedure. In each fold, 4000 patterns were gathered for

training and 1000 patterns for testing. For analysis purposes, the training mean

square error (MSE), training time and number of training epochs were recorded

in each fold and averaged across the ve folds. As the training time is relatively

dependent on the machine used, we compute the training time in terms of the gradient descent epoch time unit or gdeu. One gdeu is dened as the average time

taken by the network to perform one gradient descent training epoch on a xed

training set and a xed-size network, and it remains constant throughout the gradient descent training process. On a PC with 3 GHz CPU and 2 GB RAM, using

MATLAB software as programming language, one gdeu time unit is approximately

42.5 seconds, based on a network with 1366 trainable parameters and a training set

of 4000 samples.

Figure 3 shows that both training methods converge with dierent speeds. In

terms of the mean square error (MSE), as a function of the number of epochs,

Fig. 3(a) shows that the LM algorithm has better convergence speed than QRProp;

however, based on the training time, Fig. 3(b) shows that the MSE of the LM

algorithm decreases slower than that of QRProp. Moreover, after a certain number of gdeus, the MSE of the LM algorithm remains constant, indicating that the

training algorithm has reached a local minimum. On the other hand, the MSE of

the QRProp method gradually decreases and becomes smaller than that of the LM

algorithm. Another test was conducted to analyze the classication performance of

the training algorithms. The results, based on ve-fold cross-validation, are shown

in Fig. 4. Since the LM shows better convergence, the trained network yields higher

classication accuracy after a few epochs; for instance, at 20 iterations, the trained

network achieves a classication accuracy of 96.8% on the 4000 training patterns

and 94.9% on the 1000 test patterns. However, the LM algorithm is known to have

0.5

LM

QRProp

LM

QRProp

0.5

1

1.5

2

2.5

1

2

3

3

3.5

20

40

60

(a)

80

100

100

200

300

400

500

(b)

Fig. 3. The convergence speed of the training algorithms as a function of (a) number of training

epochs and (b) number of gdeus.

00179

0.9

0.9

0.8

0.7

0.6

0.5

0.4

0.3

LM

QRProp

0.2

0.1

20

40

60

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

100

80

55

50

100

150

200

250

300

250

300

0.9

0.9

(a)

0.8

0.7

0.6

0.5

0.4

0.3

LM

QRProp

0.2

0.1

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

20

40

60

80

100

50

100

150

200

(b)

Fig. 4. The classication accuracy of the training algorithms versus the number of training epochs

and the training time based on (a) the training set and (b) the test set.

some shortcomings such as the computation of the Hessian matrix and its storage.

On a large training set of 60,000 samples, it is not possible to train a network

with 2722 trainable parameters using the LM algorithm due to the huge amount of

memory required to store the Jacobian and Hessian matrices. On the contrary, the

QRProp method requires only few gradient and function evaluations to update

the weights. Furthermore, when training for a longer period of time, say 250 gdeus,

the classication performance achieved by QRProp on the test set is similar to

that of the LM algorithm. Therefore, QRProp is chosen to train the SICoNNets for

handwritten digit recognition.

5.2. Classification performance of three SICoNNet architectures

In this experiment, we train and evaluate the classication performance of three

dierent SICoNNet architectures: fully-connected, binary-connected and toeplitzconnected. Each SICoNNet was trained on a set of 10,000 patterns and tested

on the entire test set of the MNIST database. The classication rates of the different network architectures are presented in Table 2. Clearly, all three networks

achieve classication rates higher than 90%. The best classication rate is 94.1%

achieved with the binary-connected network, followed by the toeplitz-connected

56

00179

Table 2.

Classication Rate for Each Digit Class (%)

SICoNNets

Binary

Toeplitz

Full

Accuracy

(%)

98.3

97.1

95.9

97.9

96.7

96.5

95.1

96.6

92.4

92.2

95.0

86.6

93.0

92.8

89.1

91.3

92.2

85.2

95.9

95.5

93.7

93.4

91.0

89.0

93.0

89.3

84.7

91.8

90.1

88.2

94.1

93.6

90.2

90.2%. Among the three networks, the partially-connected networks perform better

than the fully-connected network. This may be due to the fact that the partiallyconnected networks have fewer connections within the network structure.

5.3. Performance of the handwritten digit recognition system

After analyzing the experiment results of the previous two sections, a binaryconnection scheme is chosen to build the handwritten digit recognition system.

The binary-connected network has 24 feature maps (or 2722 trainable weights) and

is trained on the entire training set (containing 60,000 samples) of the MNIST

database, using the QRProp. The trained network is then evaluated on the entire

test set. The classication performance of this network is presented as a confusion

matrix, Table 3. For each digit, the network has a recognition accuracy greater than

97%, except for the digits 5 and 9. The digit 9 has the worst recognition accuracy

of 95.0%. Many of the nine digits are classied as a four, and vice versa. This is due

to the fact that some patterns in the test set are written with heavy strokes, which

caused the digit four to appear as a nine and nine as a four (see samples in Fig. 5).

Nevertheless, the overall classication accuracy of the system is over 97.3%.

Table 4 shows the recognition error rates of dierent neural-based classiers

tested on the MNIST database together with their network sizes. To the best of

Table 3.

Network Predicted Class

Actual class

0

1

2

3

4

5

6

7

8

9

970

0

7

0

1

3

8

1

4

4

0

1120

1

0

0

0

3

2

2

4

1

2

1001

5

3

2

4

8

1

0

0

2

6

982

0

11

0

6

6

10

0

0

1

0

954

1

2

2

4

11

0

0

0

7

0

862

2

0

5

7

6

2

0

0

4

5

935

0

1

1

2

1

3

6

1

1

2

999

3

5

1

8

5

10

0

4

2

0

945

8

0

0

0

0

19

3

0

6

3

959

Classication accuracy

Classication

Rate (%)

99.0

98.7

97.8

97.2

97.1

96.6

97.6

97.6

97.0

95.0

97.3

00179

57

(a) Digit 4

(b) Digit 9

Fig. 5. Examples of digit patterns in the test set that were misclassied (a) digit four predicted

as nine, and (b) digit nine predicted as four.

database. The second and third column present the number of feature maps (F. maps) or

neurons and the number of trainable weights (T. weights), respectively, in the networks.

NN Classier

MLP7

3-layer

LeNet-57

Boosted GCNN9

CoNN with cross entropy10

CoNN24

SICoNNet

No. of F. Maps/Neurons

No. of T. Weights

1160

164

176

55

24

936,660

60,000

63,156

127,540

18,370

2,722

2.95

0.80

0.68

0.40

1.20

2.70

our knowledge and from the list,8 the most successful classier reported to date

was developed by Simard et al.10 with an error rate of 0.4%. However, their CoNN

has the most trainable network parameters, apart from the three layer MLP, with

127,540 trainable weights. This amount of weights is computed from their given network structure and assumed that the 100 hidden neurons in the third hidden layer

of the network is fully connected to the 50 feature maps of size 5 5 in the second

hidden layer, and each feature map has a single receptive eld. Most of the classiers based on convolutional neural networks have error rates of less than 1%, at the

expense of having more than 10,000 trainable weights. Even though the same test set

from the MNIST database is used to evaluate the performances of these networks,

the size of the training set and the preprocessing applied to the training patterns

are dierent; for example, LeNet-5 and the network implemented by Simard et al.

were both trained on an augmented training set with articially distorted versions

of the original digit patterns so as to accommodate all form of ane transformations. The proposed CoNN, on the other hand, was trained and tested on binary

images, and its recognition error rate is lower than that of the MLP, but higher

than those of the existing CoNNs. However, it has the least number of trainable

weights with 24 feature maps in the hidden layers behaving as feature detectors.

To improve the performance of the proposed CoNN for this pattern recognition

task, a large training set with distorted digit patterns can be used, and the network

structure can be modied so that another classication layer is added between the

last feature extraction layer and the output layer, as with two classication layers

58

00179

MLP as a classier to the extracted features generated at the last hidden layer of

the proposed CoNN.

6. Conclusion

In this paper, we proposed to use a new class of convolutional neural networks

for handwritten digit recognition. These networks, known as shunting inhibitory

convolutional neural networks, have a exible network structure with three connection schemes: fully-connected, binary-connected and toeplitz-connected. A hybrid

training method (QRProp), derived from existing rst-order training algorithms,

was used to train the networks for handwritten digit recognition. The performance

of QRProp was compared to that of the LevenbergMarquardt algorithm. Experimental results show that the QRProp method has better convergence speed than

the LM algorithm, in terms of the training time, and achieves similar classication

accuracy. Among the three dierent SICoNNet architectures (binary-, toeplitz-, and

fully-connected networks), the binary-connected network has the best recognition

rate. Evaluated on the MNIST database, a binary-connected network, with 2722

trainable weights, achieves a correct classication rate of 97.3%.

References

1. K. Fukushima, S. Miyake and T. Ito, Neocognitron: A neural network model for a

mechanism of visual pattern recognition, IEEE Trans. Syst. Man Cybernet. SMC13(5) (1983) 826834.

2. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and

L. D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural

Comput. 1(4) (1989) 541551.

3. C. Garcia and M. Delakis, A neural architecture for fast and robust face detection, in

Proc. Sixteenth Int. Conf. Pattern Recogn., Quebec Canada 2 (2002) 4447.

4. S. Lawrence, C. L. Giles, A. C. Tsoi and A. D. Back, Face recognition: a convolutional

neural network approach, IEEE Trans. Neural Networks 8(1) (1997) 98113.

5. B. Fasel, Multiscale facial expression recognition using convolutional neural networks,

in Proc. Third Indian Conf. Comput. Vision, Graphics Image Process, Ahmedabad,

India (2002).

6. S.-C. B. Lo, J.-S. J. Lin, M. T. Freedman and S. K. Mun, Application of articial

neural networks to medical image pattern recognition: Detection of clustered microcalcications on mammograms and lung cancer on chest radiographs, J VLSI Signal

Process. Syst. 18(3) (1996) 263274.

7. Y. LeCun, L. Bottou, Y. Bengio and P. Haner, Gradient-based learning applied to

document recognition, Proc. IEEE 86(11) (1998) 22782324.

8. Y. LeCun, The MNIST database of handwritten digits, http://yann.lecun.

com/exdb/mnist.

9. A. Calder

on, S. Roa and J. Victorino, Handwritten digit recognition using convolutional neural networks and Gabor lter, in Proc. Int. Congr. Comput. Intell., Medellin,

Colombia (2003).

00179

59

10. P. Y. Simard, D. Steinkraus and J. C. Platt, Best practices for convolutional neural

networks applied to visual documents analysis, Proc. Seventh Int. Conf. Document

Anal. Recogn. 2 (2003) 958962.

11. D. Gorgevik and D. Cakmakov, An ecient three-stage classier for handwritten digit

recognition, Proc. 17th Int. Conf. Pattern Recogn. 4 (2004) 507510.

12. F. H. C. Tivive and A. Bouzerdoum, Ecient training algorithms for a class of shunting inhibitory convolutional neural networks, IEEE Trans. Neural Networks 16(3)

(2005) 541556.

13. L. J. Borg-Graham, C. Monier and Y. Fregnac, Visual input evokes transient and

strong shunting inhibition in visual cortical neurons, Nature 393(6683) (1998) 369

373.

14. J. S. Anderson, M. Carandini and D. Ferster, Orientation tuning of input conductance,

excitation, and inhibition in cat primary visual cortex, J. Neurophysiol. 84 (2000)

909926.

15. Y. Fregnac, C. Monier, F. Chavane, P. Baudot and L. Graham, Shunting inhibition,

a silent step in visual computation, J. Physiol. 97 (2003) 441451.

16. A. Bouzerdoum, A new class of high-order neural networks with nonlinear decision

boundaries, in Proc. Sixth Int. Conf. Neural Inf. Process., Perth 3 (1999) 10041009.

17. A. Bouzerdoum, Classication and function approximation using feed-forward shunting inhibitory articial neural networks, in Proc. IEEE-INNS-ENNS Int. Joint Conf.

Neural Networks (2000) 613618.

18. F. H. C. Tivive and A. Bouzerdoum, A face detection system using shunting inhibitory

convolutional neural networks, in Proc. Int. Joint Conf. Neural Networks 4 (2004)

25712575.

19. B. Fasel, Robust face analysis using convolutional neural networks, in Proc. Sixteenth

Int. Conf. Pattern Recogn., Quebec, Canada 2 (2002) 4043.

20. M. Riedmiller and H. Braun, A direct adaptive method for faster backpropagation

learning: The RPROP algorithm, Proc. IEEE Int. Conf. Neural Networks (1993)

586591.

21. S. Fahlman, An empirical study of learning speed in back-propagation networks,

Carnegie Mellon University, Technical Report CMU-CS 88-162 (1988).

22. T. Tollenaere, SuperSAB: Fast adaptive BP with good scaling properties, Neural

Networks 3 (1990) 561573.

23. M. T. Hagan and M. Menhaj, Training feedforward networks with the marquardt

algorithm, IEEE Trans. Neural Networks 5 (1994) 989993.

24. E. Poisson, C. V. Gaudin and P.-M. Lallican, Multi-modular architecture based on

convolutional neural networks for online handwritten character recognition, Proc. 9th

Int. Conf. Neural Inf. Process. 5 (2002) 24442448.

- Our Digital DoublesUploaded byAnonymous 733U65U
- Summary of Deep Learning Concepts (Mar 6 2018)Uploaded byMarko Mitić
- Sequence to Sequence Learning With Neural NetworksUploaded byIonel Alexandru Hosu
- Deep MIML NetworkUploaded bySier Tskian
- An Adaptive Neural Network Learning Based Solution for the Inverse Kinematics of Humanoid FingersUploaded byAurel GS
- allaboutailuminarylabsjanuary122017-170112151616Uploaded bypruebaprueba00
- Wind Speed ForecastingUploaded byPunit Ratnani
- ProgramUploaded bymhfarshad
- aUploaded byA
- A Comparative Study Of Backpropagation Algorithms In Financial PredictionUploaded byBilly Bryan
- Artificial Neural NetworkUploaded byAhmed Khazal
- telemetry reg 2010 pages 42 to 45Uploaded byKalla Prakash
- Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human E.pdfUploaded bycesar
- Eyoh Phd Proposal Seminar 1 Prof OkekeUploaded byEdidiong Urua Swagless'
- Genetic Algorithm Optimizing Neural Network for Short-Term Load ForecastingUploaded byzakifikrii
- Telecardiology and Teletreatment System Design for Heart Failures Using Type-2 Fuzzy Clustering Neural NetworksUploaded byAI Coordinator - CSC Journals
- Computations 2Uploaded byVincent Machatsch
- ANN & Fuzzy in WaterUploaded bySudharsananPRS
- 10.1007@978-3-319-95921-42.pdfUploaded bySana Khan
- application of anfisUploaded byVikashKumar
- Irzan Raditya - Artificial Intelligence 101Uploaded byseonyeon case
- Molecular Structure ClassificationUploaded byNikhil Gupta
- 2012011601413041367Uploaded byDuc Thien Nguyen
- Neucom Project Peters (1)Uploaded byshanthi_mimina
- Recent Techniques Used in Transmission Line ProUploaded byVenkat Ramasani
- Finalppt Revised 30-04-2018Uploaded byvarunsingh214761
- Redes NeuroanlesUploaded byHeńřÿ Łøĵæń
- usman ali ML.pdfUploaded byusman ali
- PatternRecognition-1Uploaded byayk
- icecs99.pdfUploaded byhaawmaaw

- Your Forensic ToolkitUploaded bymohammed2015amine
- MRMRKKT.pdfUploaded bymohammed2015amine
- MRMRUploaded bymohammed2015amine
- JaksonSLUploaded bymohammed2015amine
- 1.1.introUploaded bymohammed2015amine
- Gene SelectUploaded bymohammed2015amine
- Ch10Uploaded bymohammed2015amine
- 0424.pdfUploaded bymohammed2015amine
- Dynamic AnalysisUploaded bymohammed2015amine
- Time Line AnalysisUploaded bymohammed2015amine
- Forencisc ProcessUploaded bymohammed2015amine
- CSE4482 ListOfTopics 2013 v4Uploaded bymohammed2015amine
- Chapter 3 Physical Evidence AUploaded bymohammed2015amine
- atelier 2---signature de fichier Image Files+ADSUploaded bymohammed2015amine
- Atelier Pro DiscoverUploaded bymohammed2015amine
- Static AnalysisUploaded bymohammed2015amine
- ch7Uploaded bymohammed2015amine
- inf3510-2016-h07-forensicsUploaded bymohammed2015amine
- 91.580.203_ch02Uploaded bymohammed2015amine
- TP 1 ForensicUploaded bymohammed2015amine
- FAT 12-16-32 formatUploaded bymohammed2015amine
- Windows Forensics File AnalysisUploaded bymohammed2015amine
- biocryptUploaded bymohammed2015amine

- Dietterich00_EnsembleMethodUploaded byjuannepomuseno
- DWDM UNIT 6Uploaded bydeeuGirl
- Solution First Point ML-HW4Uploaded byJuan Sebastian Otálora Montenegro
- 08-Unit8Uploaded byKamal Kant
- Data Mining Classification & PredictionUploaded byTarjani Soni
- Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN modelUploaded byEdgar Marca
- [46] Recent Advances in Techniques for Hyper Spectral Image ProcessingUploaded byNguyen Huu Tan
- cs231n_2017_lecture2Uploaded byfatalist3
- Aplikasi KNNUploaded byRifki Husnul Khuluk
- Comparative Analysis of Various Decision Tree Classification Algorithms Using WEKAUploaded byEditor IJRITCC
- eagle_TR1Uploaded byMark Aldiss
- Artificial Neural Network-Based Fault Distance Locator for Double-Circuit Transmission LinesUploaded byapofview
- Bayesian Neural Networks for Bridge Integrity AssessmentUploaded byFrancisco Calderón
- projectpaperkddcup2014-140922023843-phpapp01Uploaded byUtkarsh Shrivatava
- Sign Language Classification Using Webcam ImagesUploaded byAarav Kumar
- Survey on Brain Tumour Detection and Segmentation Techniques on MRI ImagesUploaded byInternational Journal of Innovative Science and Research Technology
- ch12Uploaded byyashwanthr3
- Hybrid DecisionUploaded byAnthony Luna
- Revisiting H-R Rules Using ANNsUploaded byzym1003
- anjalyUploaded byPranava Pranu
- Machine Learning for HumansUploaded byAakash Patil
- Regression AnalysisUploaded byshubham2728
- Machine Learning Cognition and Big Data OberlinUploaded bykrishnanand
- 2017 [Doi 10.1016%2Fj.renene.2017.01.022] Yan, Xingyu; Abbes, Dhaker; Francois, Bruno -- Uncertainty Analysis for Day Ahead Power Reserve Quantification in an Urban mUploaded byMohamed G Abidi
- data mining technique using weka toolUploaded byparthpppp
- Part 1 Building your Own Binary Classification Model.txtUploaded byWathek Al Zuaiby
- PID3418051.pdfUploaded byRaveendhra Iitr
- Complete Reference C5.0 GoodUploaded byshiva_dilse123
- Fermentor Control ProgUploaded byIbrahim Ali
- Mobile data challengeUploaded byvarsha542