This action might not be possible to undo. Are you sure you want to continue?
https://www.scribd.com/doc/23159130/INTEGRATEDHANDWRITINGRECOGNITIONSYSTEMUSINGARTIFICIAL
05/31/2014
text
original
INTEGRATED HANDWRITING RECOGNITION SYSTEM USING ARTIFICIAL NEURAL NETWORKS
RAILEANU, AnaMaria; CARSTOIU Dorin
Abstract: In this study we set purpose to prove the high degree of security that can be oferred by using ANNs as the base of a biometric system. Neural networks , based upon a feedforward architecture are being used in problem solving as universal approximators in concrete associations such as classification (including nonlinear separable classes), prediction, compression.The error backpropagation algorithm has been used to train the multilayered perceptron network. The results showed that errors can be reduced by increasing the number of learning epochs and the number of input charaters up until a point and that, of course, there is room for improvement. Key words: neural networks, biometrics,character recognition A multitude of types of neural networks have been proposed over time. Actually, neural networks have been so intensely studied (for example by IT engineers, electronics engineers, biologists and psychologists) that they have received a variety of names. Scientists refer to them as „Artificial Neural Networks”, „MultiLayered Perceptron” and „ParallelDistributed Processors”. Despite this fact ,there is a small group of classical networks which are used, mainly networks which use the BackPropagation algorithm, Hopfield networks, „Competitive” networks and those networks which use Spiky neurons. Knowledge can be classified by its degree of generality. At the basic level are signals which contain useful data as well as parasite elements (noise). Data consists of elements which can raise a potential interest. We must consider the fact that processed data lead to obtaining information, driven by a specific interest. When a piece of information is subjected to a certain specialization then we face knowledge. Knowledge based systems, depending on the purpose and on their type, can rationalize on their own, having as a starting point signals, data, pieces of information; further more, in these knowledge based systems we may be dealing with metaknowledge. Here are a number of resons why we should study neural networks: 1. They are a viable alternative to the computational paradigm based upon the utilization of a formal model and the design of algorithm whose behaviour do not alter during use. 2. They incorporate results which derive from different fields of study, for the purpose of ontaining simple calculus architectures. 3. They model human intelligence, helping us to better understand the way the human brain works. 4. They can offer a better rejection of errors, being able to have a good performance even if the data entries have been flawed. ANNs have multiple representational forms but the most common are the mathematical. For each artificial neuron, the mathematical form consists of a function g ( x) of the input vector is
1.
INTRODUCTION
A biometric system is essentially a pattern recognition system, which makes a personal identification by determining the authenticity of a specific physiological or behavioral characteristics possessed by the user. Pattern recognition , as a branch of artificial intelligence is aiming identification of similarity relationships between abstract representations of objects or phenomena., for recognition is to classify data entry as belonging to certain classes using classification criteria based on information previously built. An important issue in designing a practical system is to determine how an individual is identified. Biometrics dates back to the ancient Egyptians, who measured people to identity them. Keeping to the basics, we submit to your attention the ideea of identifying someone by his handwriting. Every person who desires to entry a secured perimeter is obliged to write a random text , which will be compared with previously taken samples of his handwriting. Depending on the results , consisting of a percentage which illustrates the similarity , the person shall, or shall not be allowed the entry. Biometrics devices have three primary components: an automated mechanism that scans and captures a digital / analog image of a living personal characteristics; another handles compression, processing, storage and comparison of image with the stored data; the third interfaces with application systems. Pattern recognition is a branch of artificial intelligence aiming identification of similarity relationships between abstract representations of objects or phenomena. For recognition is to classify forms (= recognition) data entry (forms) as belonging to certain classes using classification criteria based on information previously built. This study is dealing with the first part of the biometric system, illustrated by the useage of artificial neural networks which are, as their name indicates, computational networks which attempt to simulate the networks of nerve cell (neurons) of the biological central nervous system.The neural network is in fact a novel computer architecture and a novel algorithmization architecture relative to conventional computers. It allows using very simple computational operations (additions, multiplication and fundamental logic elements) to solve complex, mathematically illdefined problems. A conventional algorithm will employ complex sets of equations, and will apply to only a given problem and exactly to it. The ANN will be computationally and algorithmically very simple and it will have a selforganizing feature to allow it to hold for a wide range of problems.
x , where x = (x1; x2; ... ; xi) . Each input xi
weighted according K is to the its weight postprocessing
w = (w1 ; w 2 ; ... ; w i ) .
function that is finally applied. This results in the following equation for a single neuron:
g ( x) = K (∑ wi xi )
i
(1)
When interpreting the results we must take into consideration the fact that in handwritten text we face the variability due to the loss of synchronism between the muscles of the hand as well as the variation of one’s style due to several factors, including but not limited to: education, mood,etc. Reading handwriting is a very difficult task considering the diversities that exist in ordinary penmanship. However, progress is being made. Early devices, using nonreading inks to define specificallysized character boxes, read constrained handwritten entries.
IntelliSec – The 1st International Workshop on Intelligent Security Systems 1124th November 2009, Bucharest, Romania
2.
RELATED WORK
Obviously, if one is to solve a set of diffrential equations, one would not use an ANN. But problems of recognition, filtering and control would be problems suited for ANNs. As always, no tool or discipline can be expected to do it all. And then, ANNs are certainly at their infancy. They started in the 1950s; and widespread interest in them dates from the early 1980s. So, all in all, ANNs deserve our serious attention. One field that has developed from Character Recognition is Optical Character Recognition (OCR). OCR is used widely today in the post offices, banks, airports, airline offices, and businesses. Address readers sort incoming and outgoing mail, check readers in banks capture images of checks for processing, airline ticket and passport readers are used from accounting for passenger revenues to checking database records, and form readers are used to read and process up to 5,800 forms per hour. OCR software is also used in scanners and faxes that allow the user to turn graphic images of text into editable documents. Newer applications have even expanded outside the limitations of just characters. Eye, face, and fingerprint scans used in highsecurity areas employ a newer kind of recognition. Optical Character Recognition has even advanced into a newer field  Handwritten Recognition, which of course is also based on the simplicity of Character Recognition. The basic principles of the artificial neural networks (ANNs) were first formulated by McCulloch and Pitts in 1943, in terms of five assumptions, as follows: 1. The activity of a neuron (ANN) is allornothing. 2. A certain fixed number of synapses larger than 1 must be excited within a given interval of neural addition for a neuron to be excited. 3. The only significant delay within the neural system is the synaptic delay. 4. The activity of any inhibitory synapse absolutely prevents the excitation of the neuron at that time. 5. The structure of the interconnection network does not change over time. The Hebbian Learning Law (Hebbian Rule) due to Donald Hebb (1949) is also a widely applied principle. The Hebbian Learning Law states that:”When an axon of cell A is nearenough to excite cell B and when it repeatedly and persistently takes part in firing it, then some growth process or metabolic change takes place in one or both these cells such that the efficiency of cell A [Hebb, 1949] is increased" (i.e.  the weight of the contribution of the output of cell A to the above firing of cell B is increased). 1. Historically, the earliest ANNs are The Perceptron, proposed by the psychologist Frank Rosenblatt (Psychological Review, 1958). 2. The Artron (Statistical Switchbased ANN) due to R. Lee (1950s). 3. The Adaline (Adaptive Linear Neuron, due to B. Widrow, 1960). This artificial neuron is also known as the ALC (adaptive linear combiner), the ALC being its principal component. It is a single neuron, not a network. 4. The Madaline (Many Adaline), also due to Widrow (1988). This is an ANN (network) formulation based on the Adaline above. 5. The BackPropagation network  A multilayer Perceptronbased ANN, giving an elegant solution to hiddenlayers learning [Rumelhart et al., 1986 and others]. 6. The Hopfield Network, due to John Hopfield (1982). 7. The CounterPropagation Network [HechtNielsen, 1987]  where Kohonen's SelfOrganizing Mapping (SOM) is utilized to facilitate unsupervised learning(absence of a „teacher"). The other networks, such as ART, Cognitron,
LAMSTAR, etc. incorporate certain elements of these fundamental networks, or use them as building blocks, usually when combined with other decision elements, statistical or deterministic and with higherlevel controllers. The Adaptive Resonance Theory (ART) was originated by Carpenter and Grossberg (1987a) for the purpose of developing artiffcial neural networks whose manner of performance, especially (but not only) in pattern recognition or classification tasks, is closer to that of the biological neural network (NN). Since the purpose of the ART neural network is to closely approximate the biological NN, the ART neural network needs no „teacher" but functions as an unsupervised selforganizing network. Its ARTI version deals with binary inputs. The extension of ARTI known as ARTII [Carpenter and Grossberg, 1987b] deals with both analog patterns and with patterns represented by different levels of grey. The cognitron, as its name implies, is a network designed mainly with recognition of patterns in mind. To do this, the cognitron network employs both inhibitory and excitory neurons in its various layers. It was first devised by Fukushima (1975), and is an unsupervised network such that it resembles the biological neural network in that respect. The LAMSTAR (LArge Memory STorage And Retrieval) network is not a specific network but a system of networks for storage, recognition, comparison and decision that in combination allow such storage and retrieval to be accomplished.
3.
POSSIBILITY OF HARDWARE IMPLEMENTATION
Most of the physical implementation of neural systems are based on the mathematical model due of McCulloch and Pitts (1943). The main issues raised by the synthesis of artificial systems which simulate actual behavior are the number and nature of biological real features, starting with the connectivity matrix of elements whose size increases with the square of their number, and processing time, which must be independent of the size of the network. Complex neural networks produce temporal variations of network parameters and can perform some more sophisticated mathematical operations than mere summary of the signals. Consequently, elements of processing are organized in several layers of input, output and one or more hidden layers. ANN physical implementation should incorporate as many aspects of physiological, and operational characteristics of the mathematical models as possible. We can highlight three main physical modeling of neurons and default artificial neural networks,considering the advantages and limitations of technology: a) Analog modeling of the amplifier gain control and resistive synapses; b) ANN modeling with semiparallel shift registers; c) electrooptical modeling of ANN. Main trends in ANN approach form of an integrated circuit semiconductor (IC) are to increase density components of the circuit per unit area. The limited level of integration is determined by the matrix of connectivity whose size increases with the square of the number of the dynamic processing units (neurons). In essence, the matrix of connectivity is phisically made through a network of perforations arranged in an insulating material,in which conductive material is injected (usually polycrystalline silicon). On the two sides of the insulating material are secured two sets of metal interconnections, which correspond to inputs (dendrites) and outputs (axon) of amplifiers (neurons). Physically achieving positive and negative synapses is made by doubling each neuron. The value
IntelliSec – The 1st International Workshop on Intelligent Security Systems 1124th November 2009, Bucharest, Romania of each resistor is determined by the section of the hole and corresponding inverse synaptic efficacy. The circuit integration can be achieved both by standard bipolar technology, the modest level of integration, as well as CMOS technology. In order to avoid the difficulty brought about by the imposed compromise between the high level of connectivity and synaptic contacts inaccessibility we could use a scheme of neural network implemented using CCD (Charge Coupled Device) type microelectronic circuits beacouse CCD shift registers can store discrete groups of electrons in well defined position, which can then be quickly moved by applying external potential, keeping their local value. units and one hidden layer we can choose for the last one the size as M ⋅ N ). If the number of hidden layer neurons is too small, the network fails to form an adequate internal representation of data training and thus the classification error will be high. With a number too large, the network learns very well the training data but it turns out to be incapable of obtaining a good generalization obtaining high levels of error for the test data. Therefore the input vector consists of 150 parts representing the matrix elements ,size 10x15 pixel binary representation. The matrix size was chosen considering the average values of the characters represented, with a minimum of noise introduced. Used for network learning algorithm is the wellknown Backpropagation, proposed in 1986 by Rumelhart, Hinton and Williams for setting weights and hence for the training of multilayer perceptrons. Here's how learning arises: it initializes the network’s weights with some random numbers, usually between 1 and 1. The next step consists of applying the set of entry data and calculating the exit (this step is called "step forward"). The calculation brings one result completely different from our target, because all the weights had had random values. At this point the error of each neuron is calculated, which usually meets the formula: Target  Effective output. This error is then used to modify the weights so that the error has become increasingly smaller. The process is repeated until when the error is minimal. The learning rate (η ) is to speed up or slow down the learning process, if appropriate. We have decided upon a learning rate of 150 as being the appropiate for this system, but we are allowing the user to modify it in the range of 1 to 200 and configure it according to his own needs. The detection of the symbols is a very important part of the program. This is based on the premise that we are dealing only with black and white images, where white RGB (255,0,0,0) = space and black RGB (255,255,255,255) = character bitmap image, with any resolution. It is also considered that the image contains only characters, any another form of existence (line of the table, edge, etc..) are considered noise.
Fig. 1. A neural network architecture implemented with CCD shift register type The circuit shown in the previous figure mostly avoids the limitations imposed by the high degree of connectivity characterized by a synaptic matrix easily accessible and modifiable. On the other hand, the arrangement proposed is partially sacrificing the parallelism and the asynchronous signal processing which gave rise to the original idea. It is true that relatively high speed CCD circuits partially compensates nonparallel data processing. Currently, there can be implemented shift register containing up to 2000 CCD circuits operating at frequencies of 10MHz.
4.
PROPOSED SOFTWARE ARCHITECTURE
In order to obtain positive results a networktype feedforward was chosen for implementing the integrated system,consisting of 150 neurons in the input layer, 250 neurons in the hidden layer and 16 neurons in the output layer ,which represent the characters of the alphabet in binary code, each uniquely represented on 16 bits. Inserting hidden layers enhances the capacity of representation of the feed forward networks but raise difficulties in terms of learning as „delta” type algorithms can not be directly applied. This was one of the main reasons for the stagnation of the development of feedforward networks with supervised learning between 1969 (when Papert and Minsky highlighted the limits of singlelevel networks) and 1985 (when the BackPropagation algorithm, developed in parallel by several researchers became known). In determining the number of neurons in each layer the following had been taken into account: Both entry level and output level should have as many units as needed to represent input data respectively output. The number of hidden units should be just enough to solve the problem, but not higher than necessary. The number of hidden units is based either on theoretical results concerning the capacity of representation of the architecture (such as the case of the current chosen network) or heuristic rules (eg. for a network with N input units and M output
Fig. 2. Neural Network Chosen Architecture We should mention also that each set of training consists of an image and a text file containing the desired output. Concerning the user’s ability to customize the application, we have granted the possibility to choose an activation function. To fully understand the mechanism we should acknowledge that in biologicallyinspired neural networks, the activation function is usually an abstraction representing the rate of action potential firing in the cell. In its simplest form, this function is binarythat is, either the neuron is firing or not. The function looks like φ(vi) = U(vi), where U is the Heaviside
IntelliSec – The 1st International Workshop on Intelligent Security Systems 1124th November 2009, Bucharest, Romania step function. In this case a large number of neurons must be used in computation beyond linear separation of categories. Activation functions for the hidden units are needed to introduce nonlinearity into the network. Without nonlinearity, hidden units would not make nets more powerful than just plain perceptrons (which do not have any hidden units, just input and output units). The reason is that, a linear function of linear functions is again a linear function. However, it is the nonlinearity (i.e, the capability to represent nonlinear functions) that makes multilayer networks so powerful. Almost any nonlinear function does the job, except for polynomials. For hidden units, sigmoid activation functions are usually preferable to threshold activation functions. Here they are: the standard sigmoid function which ranges from 0 to 1:
5.
EXPERIMENTAL RESULTS
5.1. Results obtained for variation in the number of epochs of training are illustrated in the following tables: Activation function:= bipolar sigmoid. Number of symbols =90, Learning rate=150 50 Used Font 200 500
Misid Misid Misid Error Error Error char char char 19 13 11 800 24% 2 14.5% 5 12.3% 4 1000 3% 4 5.6% 3 4.5% 1 2000 4.5% 3.4% 1.2%
y=

1 1 + e − D⋅ x
Arial Tahoma Times New Roman
(2) Used Font
the hyperbolic tangent which ranges from 1 to 1:
y=

2 −1 1 + e−2⋅ x
2
(3) Arial Tahoma Times New Roman
Misid Misid Misid Error Error Error char char char 2 2 2 3% 3% 3% 1 2 1 1.2% 2 3% 2 1.2% 1 3% 3% 1.2%
the Gauss function (4) another sigmoid usually used:
y = e− x

y=
x 1+ x
(5) 3000 Used Font Misid Error char 0 1 0 0% 1.2% 0%
As for the last function, if the network output is a set of numerical values , then it will require more iterations to achieve the target value. But if the problem is a classification, as in this case, this function is appropriate because it consumes less during the central unit processing, without the number of iterations being affected. Networks with threshold units are difficult to train because the error function is stepwise constant, hence the gradient either does not exist or is zero, making it impossible to use backprop or more efficient gradientbased training methods. Considering all of the above the user has a choice to make: unipolar sigmoid, bipolar sigmoid, linear function, Heaviside or Gauss functions. Most supervised learning algorithms rely on minimizing an error function using a gradient type method, therefore the general structure comprises two stages: the initialization of the parameters and the iterative process of adjustment. Operations that are made in the software implemented program can be classified into two classes: 1. Training phase 1.1. Analysis of image and character separation; 1.2. Convert symbols to pixel matrices; 1.3. Search for the desired output and convert to ASCII; 1.4. Matrix linearization and sending to the network entry; 1.5. Output calculation; 1.6. Comparing the obtained output to the desired output and calculation of error; 1.7. Adjusting weights properly until reaching the maximum number of iterations. 2. Testing phase 2.1. Analysis of image and character separation 2.2. Convert symbols to pixel matrices; 2.3. Output calculation; 2.4. Displaying the recognized character.
Arial Tahoma Times New Roman
Variation in the number of epochs 50 45 40 35 30 25 20 15 10 5 0 50 200 500 800 1000 2000 number of epochs
A
number of misidentified characters
T
TN R
F Fig. 3. Variations in the number of epochs for different font styles 5.2. Results obtained for variation in the number of characters to input Activation function:= bipolar sigmoid. Number of epochs =300, Learning rate=150
IntelliSec – The 1st International Workshop on Intelligent Security Systems 1124th November 2009, Bucharest, Romania Used Font 20 Misid char 0 0 0 50 Misid Error char 0 6 0 3 0 2 90 Misid Error char 12 11 6 8 4 9 to the fact that the adjustment of the parameters are very small. § Overtraining : the network provides a good approximation on the set of training, but possesses a low generalization ability. Starting from the standard BP there can be developed BackPropagation algorithm variants that differ by: § How to choose the learning rate: constant or adaptive; § Relations adjustment (determined by the minimization algorithm used, which is different from simple gradient algorithm: conjugate gradient type algorithms, Newtontype algorithms, decrease random algorithms, genetic algorithms, etc.). § The way the parameters are being intialized: random initialization or based upon a search algorithm. § Attending training set (it only influences the serial version of the algorithm): sequential or random; § Error function: besides the squared mean error there can be used some specific measures of error to solve the problem (eg in case of classification problems there can be used an entropy based error); § Stopping criterion: In addition to the criterion based on the maximum number of epochs and the corresponding error of training set we can use criteria related kit validation error and the size adjustment of the last era.
Error 12.22 8.89 10
Latin Arial Latin Tahoma Latin Times Roman
5.3. Results obtained for variation of learning rate Activation function:= bipolar sigmoid. Number of symbols =90, Number of epochs=300 1 10 40 Used Font Misid Misid Misid Error Error Error char char char Arial Tahoma Times Roman 56 70 48 63% 6 78% 8 54% 4 80 Used Font Misid Misid Error Error char char Arial Tahoma Times Roman 2 2 0 2.33% 0 2.33% 2 0 0 0% 2.33% 0% 6.78% 5 8.9% 10 4.5% 3 120 5.6% 11.2% 3.4%
7.
CONCLUSIONS
Variation of learning rate
80
Number of misidentified charactrs
70 60 50 40 30 20 10 0 1 10 40 80 120 Learning rate TNR T A
Fi Fig. 4. Variations of the learning rate for different font styles
6.
POSSIBLE IMPROVEMENTS
The motivation for needing to develop new versions of the standard BP algorithm is that it presents a number of drawbacks: § Slow convergence: it requires too many epochs to reach a value for which the error is low enough § Blocking in a local minimum: once the algorithm reaches a local minimum of the error function, the algorithm does not allow escape from this minimum to achieve the global optimum. § Stagnation (paralysis):the algorithm stagnates in an area that is not necessarily near a local minimum, due
The NN was trained and tested for different test and training patterns. § In all the cases the amount of convergence and error rate was observed. § The convergence greatly depended on the hidden layers and number of neurons in each hidden layer. § The number in each hidden layer should neither be too less or too high. § The NN once properly trained was very accurate in classifying data in most of the test cases. The amount of error observed makes it ideal for classification problems like Face Detection. There are a few things worth mentioning about this system: § If learning rate is subunit, the network can not handle the learning process, the number of correctly recognized characters tending to zero. § Also tests have been performed on the recognition of other character sets than the ones learned initially by the network. Unfortunately, results have not been very encouraging. § Following the experimental results, we can see that, in general, increasing the number of epochs has a positive effect on the network’s performance. This is going until the network reaches the optimum point. Also we can see that, if the number of epochs is increased further, the network tends to become unstable, increasing the number of characters recognized wrong. This is called "overlearning”. § Input set size is also very important in terms of network performance. The more symbols the network must learn, the more likelihood of errors is greater. Concluding, for a set of maximum 90 symbols in the set of learning the network requires 250 neurons in the hidden layer of the network. In the current study the best results were provided by the additive model, bipolar sigmoid activation function, feedforward architecture and supervised learning algorithm, the backpropagation. We were able to see that learning was more effective when the
IntelliSec – The 1st International Workshop on Intelligent Security Systems 1124th November 2009, Bucharest, Romania input set had a reduced number of items and optimum results were obtained when the test image contained a small number of words, words preferably with as many letters as possible that were repeated . Also, the characters more likely to give rise to errors were "H", "I", "L", "g". The figures have caused errors of recognition only if the network was trained by an insufficient number of times (<100). It has been proved that the number of hidden neurons is an optimal value, generally impossible to predict with accuracy, before network performance evaluation in several sets of experiments. This conclusion is also true for the total number of training epochs and the final value of the average quadratic error: the optimal value of these parameters can be determined only experimentally and they are conditioned by the type and structure of the artificial neural network. Handwriting recognition using neural networks has seen many implementations, but none managed to achieve acceptable performance as to be used in commercial applications.Systems lack the reliability and robustness, which may be achieved only through extensive research and decades of experiments. In summary, the excitement in ANNs should not be limited to its greater resemblance to the human brain. Even its degree of selforganizing capability can be built into conventional digital computers using complicated artificial intelligence algorithms. The main contribution of ANNs is that, in its gross imitation of the biological neural network, it allows for very low level programming to allow solving complex problems, especially those that are nonanalytical and/or nonlinear and/or nonstationary and/or stochastic, and to do so in a selforganizing manner that applies to a wide range of problems with no reprogramming or other interference in the program itself. The insensitivity to partial hardware failure is another great attraction, but only when dedicated ANN hardware is used. Given enough entrepreneurial designers and sufficient research and development dollars, OCR can become a powerful tool for future data entry applications.
8.
REFERENCES
Danciu D., Răsvan V. (2008). Neural networks. Equilibria, Synchronization, Delays, Information Science Reference (Idea Group Inc.), ISBN 9781599048499, U.S.A Danciu D. (2008). Neural Networks Dynamics as Systems with Several Equilibria, Information Science Reference (Idea Group Inc.), ISBN 9781599049960, U.S.A. Fausett L. (1994). Fundamentals of Neural Networks, PrenticeHall, ISBN 0 13 042250 9, U.S.A. Graupe D.(2007). Principles of Artificial Neural Networks,World Scientific Publishing, ISBN 13 9789812706249, Singapore Gurney K. (1997). An Introduction to Neural Networks, UCL Press, ISBN 1 85728 503 4 , U.S.A. Stanasila,O. & Neagoe, V. (2000).Teoria Recunoasterii Formelor, Editura Academiei Romane, ISBN 9732703415 , Romania ***(2009)http://franck.fleurey.free.fr/FaceDetection/inde x.htm, Accessed on: 20091020 ***(2009)http://page.mi.fuberlin.de/rojas/neural/chapter/K3.pdf , Accessed on: 20091014 *** (2009) http://www.ineweb.org/, Accessed on: 20091014 ***(2009)http://free.beyonddream.com/NeuralNet/fundamental .htm, Accessed on: 20091011
This action might not be possible to undo. Are you sure you want to continue?