This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

6 Kb Download source - 51.1 Kb

Introduction

A lot of people today are trying to write their own OCR (Optical Character Recognition) System or to improve the quality of an existing one. This article shows how the use of artificial neural network simplifies development of an optical character recognition application, while achieving highest quality of recognition and good performance.

Background

Developing proprietary OCR system is a complicated task and requires a lot of effort. Such systems usually are really complicated and can hide a lot of logic behind the code. The use of artificial neural network in OCR applications can dramatically simplify the code and improve quality of recognition while achieving good performance. Another benefit of using neural network in OCR is extensibility of the system � ability to recognize more character sets than initially defined. Most of traditional OCR systems are not extensible enough. Why? Because such task as working with tens of thousands Chinese characters, for example, is not as easy as working with 68 English typed character set and it can easily bring the traditional system to its knees! Well, the Artificial Neural Network (ANN) is a wonderful tool that can help to resolve such kind of problems. The ANN is an information-processing paradigm inspired by the way the human brain processes information. Artificial neural networks are collections of mathematical models that represent some of the observed properties of biological nervous systems and draw on the analogies of adaptive biological learning. The key element of ANN

is topology. The ANN consists of a large number of highly interconnected processing elements (nodes) that are tied together with weighted connections (links). Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true for ANN as well. Learning typically occurs by example through training, or exposure to a set of input/output data (pattern) where the training algorithm adjusts the link weights. The link weights store the knowledge necessary to solve specific problems. Originated in late 1950's, neural networks didn�t gain much popularity until 1980s � a computer boom era. Today ANNs are mostly used for solution of complex real world problems. They are often good at solving problems that are too complex for conventional technologies (e.g., problems that do not have an algorithmic solution or for which an algorithmic solution is too complex to be found) and are often well suited to problems that people are good at solving, but for which traditional methods are not. They are good pattern recognition engines and robust classifiers, with the ability to generalize in making decisions based on imprecise input data. They offer ideal solutions to a variety of classification problems such as speech, character and signal recognition, as well as functional prediction and system modeling, where the physical processes are not understood or are highly complex. The advantage of ANNs lies in their resilience against distortions in the input data and their capability to learn.

**Using the code
**

In this article I use a sample application from Neuro.NET library to show how to use Backpropagation neural network in a simple OCR application. Let�s assume you that you already have gone through all image pre-processing routines (resampling, deskew, zoning, blocking etc.) and you already have images of the characters from your document. (In the example I simply generate those images).

**Creating the neural network.
**

Let�s construct the network first. In this example I use a Backpropagation neural network. The Backpropagation network is a multilayer perceptron model with an input layer, one or more hidden layers, and an output layer.

The nodes in the Backpropagation neural network are interconnected via weighted links with each node usually connecting to the next layer up, till the output layer which provides

output for the network. The input pattern values are presented and assigned to the input nodes of the input layer. The input values are initialized to values between -1 and 1. The nodes in the next layer receive the input values through links and compute output values of their own, which are then passed to the next layer. These values propagate forward through the layers till the output layer is reached, or put another way, till each output layer node has produced an output value for the network. The desired output for the input pattern is used to compute an error value for each node in the output layer, and then propagated backwards (and here's where the network name comes in) through the network as the delta rule is used to adjust the link values to produce better, the desired output. Once the error produced by the patterns in the training set is below a given tolerance, the training is complete and the network is presented new input patterns and produce an output based on the experience it gained from the learning process. I will use a library class BackPropagationRPROPNetwork to construct my own OCRNetwork.

Collapse | Copy Code

//Inherit form Backpropagation neural network public class OCRNetwork: BackPropagationRPROPNetwork { //Override method of the base class in order to implement our //own training method public override void Train(PatternsCollection patterns) { ... } }

I override the Train method of the base class to implement my own training method. Why do I need to do it? I do it because of one simple reason: the training progress of the network is measured by quality of produced result and speed of training. You have to establish the criteria when the quality of network output is acceptable for you and when you can stop the training process. The implementation I provide here is proven (based on my experience) to be fast and accurate. I decided that I can stop the training process when network is able to recognize all of the patterns, without a single error. So, here is the implementation of my training method.

Collapse | Copy Code

public override void Train(PatternsCollection patterns) { //Current iteration number if (patterns != null) { double error = 0;

i++) { //Set the input values of the network for (int k = 0.Count. //Run the network this.OutputNodesCount.OutputNode(k).k< this.Epoch(patterns. for (int i = 0.Learn(). i<patterns. (Teach the network) this.Count). k<NodesInLayer(0).Input[k]. // Train until all patterns are correct while (good < patterns.Run(). k++) nodes[k]. Here is how the BestNodeIndex implementation looks like: Collapse | Copy Code public int BestNodeIndex { get { int result = -1.Output[k]. double aMaxNodeValue = 0. If those indices are matched � the network has produced correct result.k++) this. (An epoch training technique) foreach (NeuroLink link in links) ((EpochBackPropagationLink)link).Error = patterns[i]. //See if network did produced correct result during //this iteration if (BestNodeIndex == OutputPatternIndex(patterns[i])) good++. I have implemented a BestNodeIndex property that returns the index of the node having maximum value and having the minimal error. //Make the network to remember corresponding output //values. An OutputPatternIndex method returns the index of the pattern output element having value of 1. //Set the expected result for (int k = 0. } //Adjust weights of the links in the network to their //average value. } } } Also.Value = patterns[i].int good = 0.Count) { good = 0. .

Collapse | Copy Code //Create an instance of the network backpropNetwork = new OCRNetwork(new int[3] {aMatrixDim * aMatrixDim. In this example I use only one layer and apply �not official rule of thumb� to determine number of nodes in this layer: Collapse | Copy Code NodesNumber = (InputsCount+OutputsCount) / 2 Note: You can experiment by adding more middle layers and using different number of nodes in there .Value. This is the layer where we look for the results.i++) { NeuroNode node = OutputNode(i). The last layer in the network is an output layer. The network may have multiple middle layers with different number of nodes in each layer. //Look for a node with maximum value or lesser error if ((node. First layer in the network is an input layer.PositiveInfinity. } } return result.just to see how it will affect the training speed and recognition quality of the network. (aMatrixDim * aMatrixDim + aCharsCount)/2.Error. aMinError = node.Value > aMaxNodeValue)|| ((node. result = i. The number of elements in this layer corresponds to number of elements in input pattern and is equal to number of elements in digitized image matrix (we will talk about it later).OutputNodesCount.Value >= aMaxNodeValue)&&(node. aCharsCount}). } } As simple as it gets I create the instance of the neural network. Creating training patterns . for (int i = 0. I define the number of nodes in this layer equal to a number of characters that we going to recognize.double aMinError = double. The network has one constructor parameter � integer array describing number of nodes in each layer of the network. i< this.Error <aMinError))) { aMaxNodeValue = node.

Under �digitizing� the image I mean process of creating a brightness (or absolute value of the color vector-whatever you choose) map of the image. To create this map I split the image into squares and calculate average value of each square.. each training pattern consists of two single-dimensional arrays of float numbers � Inputs and Outputs arrays. } The Inputs array contains your input data. outputs.1 to comply with input values range of the . In our case it is a digitized representation of the character's image. Collapse | Copy Code /// <summary> /// A class representing single training pattern and is used to train a /// neural network. (No doubt that you can use other techniques there�) After the image is digitized..Now let's talk about the training patterns. Contains input data and expected results arrays. Basically. . Those patterns will be used for teaching the neural network to recognize the images. There I use an absolute value of the color for each element of the matrix.. /// </summary> public class Pattern: NeuroObject { private double[] inputs. I have implemented CharToDoubleArray method of the network to digitize the image. Then I store those values into the array. I have to scale-down the results in order to fit them into a range from -1 .

.j<aSrc. The method CreateTrainingPatterns does this job for me. So. i<aSrc. double[] result = new double[aMatrixDim*aMatrixDim ]. The Inputs of each pattern are set to a digitized image data and a corresponding element in the Outputs array to 1.R+c.B+c.network.Height/(double)aArrayDim. it is possible to //use the B component of Alpha color space too. Each element corresponds to a single letter. for instance.Height.. int y = (int)(j/yStep). result[y*x+y]+=Math.j).. Make it 50 if you decide to include lower case letters.Width. So.R*c. for (int i=0. but I guess. } //Scale the matrix to fit values into a range from 0.Sqrt(c. to teach the network to recognize English letters from �A� to �Z� we will need 25 elements in the Outputs array. where I look for the maximum element value of the matrix and then divide all elements of the matrix by it. //calculate image quotation Y step double yStep = (double)aSrc. //Get the color of the pixel Color c = aSrc.GetPixel(i. Collapse | Copy Code public PatternsCollection CreateTrainingPatterns(Font font) { //Create pattern collection // As many inputs (examples) as many elements in digitized image matrix // As many outputs as many characters we going to recognize.Width/(double)aArrayDim. i++) for (int j=0. return Scale(result).j++) { //calculate matrix address int x = (int)(i/xStep)..G*c. To do this I wrote a Scale method.B*c.1 (required by //ANN) In this method we look for a maximum value of the element //and then divide all elements of the matrix by this maximum value. //Absolute value of the color. The Outputs array of the pattern represents an expected result � the result that network will use during the training. There are as many elements in this array as many characters we going to recognize. so network will know which output (letter) corresponds to input data.G). implementation of CharToDoubleArray looks like this: Collapse | Copy Code //aSrc � an image of the character //aArrayDim � dimension of the pattern matrix //calculate image quotation X step double xStep = (double)aSrc.

Output[i] = 1. //You can change this method to pass actual the image of the character double[] aBitMatrix = CharToDoubleArray(Convert.ToChar(aFirstChar + i). aMatrixDim. font.PatternsCollection result = new PatternsCollection(aCharsCount. an execution flow will leave this method when training is complete. aMatrixDim * aMatrixDim. Add more nodes into middle layer or add more middle layers to the network. The network topology is too simple to handle amount of training patterns you provide. i++) { //CharToDoubleArray creates an image of the character and digitizes it. j++) result[i]. The most �popular� reasons for neural network training failure are: Training never completes because: Possible solution 1. but in some cases it could stay there forever (!).Input[j] = aBitMatrix[j]. //Rest of the outputs are set to 0 by default. aCharsCount). You will have to create bigger network. Training of the network. Normally. Collapse | Copy Code //Train the network backpropNetwork. j<aMatrixDim * aMatrixDim. result[i]. 0). .Train(trainingPatterns). To start training process of the network simple call the Train method and pass your training patterns in it. i<aCharsCount. } Now we have completed creation of patterns and we can use those to train the neural network. Well. //Output value set to 1 for corresponding character. } return result. //Assign matrix value as input to the pattern for (int j = 0.this is wrong assumption and network training may never complete. I admit . // generate one pattern for each character for (int i= 0.The Train method is currently implemented relying only on one fact: the network training will be completed sooner or later.

2. along with any associated source code and files. we can enjoy the results. License This article. No reason As a solution you can clean the patterns or you can use different type of network /training algorithm. Your training expectations are too high and/or not realistic.ToString(). In order to use the network you have to load your data into input layer. is licensed under The GNU General Public License (GPLv3) ..Run().. get your results out from output nodes of the network and analyze those (The BestNodeIndex property I created in OCRNetwork class does this job for me). (your digitized image of the character) //Load the data into the network for (int i = 0.Value = aInput[i]. you cannot train the network to guess next winning lottery numbers. Following code fragment shows how to use trained neural network in your OCR application. :-) Lower your expectations. Enjoying the results Now we can see what the network has learned. i< backpropNetwork. Finally. not precise or are too complicated for the network to differentiate them.InputNode(i).BestNodeIndex). 3.. Meanwhile. Then use the Run method to let the network process your data. //Get result from the network and convert it to a character return Convert.ToChar(aFirstChar + backpropNetwork. The network could be never 100% "sure" Check the code! Most of those reasons are very easy to resolve and it is a good subject for a future article. The training patterns are not clear enough. //Run the network backpropNetwork. 4.InputNodesCount. Collapse | Copy Code //Get your input data double[] aInput = . Also.i++) backpropNetwork..

etc. Popular approach The most popular and simple approach to OCR problem is based on feed forward neural network with backpropagation learning. For this purpose each training sample is represented by two components: possible input and the desired network's output for the input. we can give an arbitrary input to the network and the network will form an output. I'll try to review some approaches for optical character recognition using artificial neural networks. One of the most common and popular approaches is based on neural networks. such as pattern recognition. clustering. function approximation. from which we can resolve a pattern type presented to the network. After the training step is done. In this article. The main idea is that we should first prepare a training set and then train a neural network to recognize patterns from the training set.Introduction There are many different approaches to optical character recognition problem. Let's assume that we want to train a network to recognize 26 capital letters represented as images of 5x6 pixels. time series prediction. In the training step we teach the network to respond with desired output for a specified input. something like this one: . which can be applied to different tasks. so don't try to find here a ready solution for scanned document processing. The attached project is aimed as a research project.

0. . 0.5f. For each possible input we need to create a desired network's output to complete the training samples.5f.5f.5f}. -0. 0. -0. -0.5f. -0. 0. Such sort of pattern coding will lead to a greater learning performance improvement. -0. -0. -0. 0. -0. in many neural network training tasks. -0.One of the most obvious ways to convert an image to an input part of a training sample is to create a vector of size 30 (for our case).5f. 0. containing "1" in all positions corresponding to the letter pixel and "0" in all positions corresponding to the background pixels. we can start to train our network. -0.5f.5f. -0.5f.5f.5f. -0.5f.5f.5f.5" instead of "1" and "-0. But. But.5f.5f.5" for positions corresponding to the pattern�s type number and "-0.5f.5f. -0. -0. -0.5f.5f.5f.5f. placing into the vector "0.5f}. -0. Finally. 0. -0.5f.5f.5f. -0. -0. the last question is about the network's structure.5 is placed only in the position of "K" letter float[] output_letterK = new float[] { -0. 0.5f. -0.5f. For OCR task it's very common to code each pattern as a vector of size 26 (because we have 26 different letters).5f. 0.5f.5f.5f.5f. -0. our training sample should look something like this: Collapse | Copy Code float[] input_letterK = new float[] { 0. -0. -0. 0. -0. 0.5" for all other positions. -0. -0. which will have 30 inputs corresponding to the size of input vector and 26 neurons in the layer corresponding to the size of the output vector. -0.5f. -0. -0. 0.5f. -0. -0.5f.5f.5f. -0.5f.5f. -0.5f. -0. 0.5f.5f. -0. -0.5f. -0.5f.5f.5f.5f. -0.5f.5f.5f. -0. it's preferred to represent training patterns in so called "bipolar" way.5f.5f.5" instead of "0". placing into input vector "0. After having such training samples for all letters.5f. So. -0.5f. -0. For the above task we can use one layer of neural network.5f. a desired output vector for letter "K" will look something like this: Collapse | Copy Code // 0.

}.5f.5f.5f..5f. -0. -0. . -0.5f. -0.5f.5f. 0.1f. // patterns count int patterns = 26.5f. -0. -0. // learning ouput vectors 0..NeuralNet. -0. -0.5f.5f.Network(new BipolarSigmoidFunction(2. -0. -0. -0..5f.5f.5f.5f. 0.NeuralNet. -0. -0. -0. -0.5f. -0. // learning input vectors float[][] input = new float[26][] { . -0. // randomize network`s weights neuralNet.5f.5f. -0.5f. 0. teacher.5f.5f.5f. -0.5f. -0. 0. -0.5f. // Letter K 0. -0.5f. patternSize. -0.5f. -0.5f.NeuralNet..5f. 0. float[][] output = new float[26][] { . patterns). }.5f.. -0.NeuralNet.BackPropagationLearning teacher = new AForge.Network neuralNet = new AForge.5f.5f.5f.5f.5f.5f.5f.5f.5f. -0.5f. 0.Learning. 0.LearningLimit = 0.. // Letter K . -0.5f}. -0.5f. new float [] { -0.5f.5f.5f.5f. -0. 0. -0.5f}. .5f. -0.. -0. // create network teacher AForge.BackPropagationLearning(neuralNet).5f.Randomize()..5f.Collapse | Copy Code // pattern size int patternSize = 30. -0.0f).5f.5f. -0. -0.5f. 0. -0. -0. -0. // create neural network AForge.5f. -0.Learning.5f. new float [] { 0.5f. 0.5f. -0.5f. 0. -0.

5f.Compute(pattern).5f. teacher.5f. int i. then the training is done and the network can be used for recognition.5f. 0. -0.5f.5f.Diagnostics. The element's number will point us to the recognized pattern: Collapse | Copy Code // "K" letter.Debug.Length.3f. // find the maximum from output float max = output[0]. -0.5f}.5f. i++. i < n. 0.5f.5f.5f. 0. 0. 0.5f. -0. 0.5f. -0. 0. -0. output).teacher.5f.5f. -0. for (i = 1.5f. 0.LearnEpoch(input.5f. -0. -0. 0.5f.5f. // System. -0.5f. In the sample above a complete neural network training procedure for pattern recognition task is provided. maxIndex = 0.5f.5f.5f. // get network's output float[] output = neuralNet.5f.LearningRate // teach the network int i = 0. -0.5f. 0. n.5f. but a little bit noised float[] pattern = new float [] { 0. -0. On each learning epoch all samples from the training set are presented to the network and the summary squared error is calculated. 0. How to recognize something? We need to input to the trained network and get its output. Then we should find an element in the output vector with the maximum value. -0. n = output.WriteLine("total learning " + "epoch: " + i). } while (!teacher. -0. -0.5f.5f. -0. 0.IsConverged). do { = 0. When the error becomes less than the specified error limit. i++) { . 0. 0.5f.5f.

OK.5" in input vector) if it does not cross a letter.Debug. let's train our network using 8x8 images." + (char)((int) 'A' + maxIndex)). or even 16x16 to get high accuracy. In this approach we'll form an input vector by using not the pixel values of the image.Diagnostics.5" in input vector) if it crosses a letter and deactivated value ("-0. But. And another big advantage is that we can try to generate a rather small receptors set. By the way. if we need to recognize a letter. Suppose we train our network using the above training set with letters of size 5x6. because they are defined as lines with two coordinates. But. which will be more performance consuming to train the neural network. which is represented by an 8x8 image? The obvious answer is to resize the image. But. but by using the receptors values. What are these receptors ? Receptors are represented by a set of lines with arbitrary size and direction. Another idea is based on using the so called receptors. The size of an input vector will be the same as receptors count. which will be able to recognize the entire training set using only most significant letter�s features. Any receptor will have an activated value ("0. 16x16 images will lead to an input vector of size 256. Suppose we have an image with a letter of arbitrary size. which contains a letter. on the other hand. . what should we do. The advantage of this method is that we can train neural network on big images even with small amount of receptors. Resizing an image with letter to 75x75 (or even 150x150) pixels will not lead to a bad image quality.if (output[i] > max) { max = output1[i]. } } // System. Another approach The approach described above works fine. so it will be much easer to recognize. maxIndex = i. that is printed with 72 font size? I don't think we'll get a good result after resizing it to 5x6 image. But there are some issues. But.WriteLine( "network thinks it is . we can use a set of short horizontal and vertical receptors to achieve the same effect as in the case of using pixel values for images of small size. we can always easily resize our receptors set. what about the image.

and then we'll choose a specified amount of best receptors. which are represented by columns. Let's describe.971 and it's bad.But. because this receptor will be 100% sure. Let's look at the second row and the first column where there is "11111". third and fifth variants. let's look at the fifth row and the first column: "10101". which contains some training data. let's look at the fifth row. which are represented by rows. because in this case we'll need too many receptors. Does it clean? OK. which is well known to us from the information theory. third and fifth variants. we'll randomly generate a large set of receptors. The described approach can be applied only to OCR task. The inner entropy will tell us how good is the specified receptor for recognizing the specified object. the better it is. But. How to resolve if the specified receptor is good or not ? We'll try to use entropy. The outer entropy is calculated for the whole column. and three receptors. So. Let me explain it with a small example: Here is a table. it also crosses the second. there are some disadvantages. as we are doing research in OCR task area. . The entropy of the set is 0. Each object has five different variants. because it cannot divide patterns. The average inner entropy is just a sum of inner entropies of the receptor for all the objects divided by the amount of objects. It's not possible to recognize complex patterns. because the receptor has the same value for all the variants of the specified object. then the receptor is useless. that the first receptor is crossing the first variant of the first object. Why? If the outer entropy is small. for example. which is much harder and which requires more research. the first value of the table: "11101". It's good. We'll use two concepts: inner entropy and outer entropy. the more it is closer to the value of "1". Inner entropy is an entropy of a specified receptor for a specified object. There is another question. But. The same thing is true with the second row and the second column: "00000". but it does not cross the fourth variant. It means. here is the final formula for calculating the receptors usability: usability = OuterEntropy * (1 AverageInnerEntropy). How to generate the receptors set? Manually or randomly? How to be sure that the set is optimal? We can use the next approach for receptors set generation: first. The best receptor should be activated for one half of all the objects and deactivated for another half. first column: a receptor crosses the first. Here we can see five types of objects. and the value should be as small as possible. The entropy of the set will be 0. it will not disturb us very much. and it does not cross the second and the fourth variants. Its entropy is 0 too. that we are working with the second object. because the receptor is not sure about the specified object.

So. for example we can save 100 receptors with the best usability. Generate data.a neural network will be created. so we can skip this step. we can continue with the training of our network in the same manner as described in the above approach. initially we should randomly generate a big set of receptors. using the idea. it looks like the second approach is able to perform the OCR task.5" for faster training or leave it "0. all our experiments were made using the ideal training set. You should use "Scale" option for it and you will need to play a little bit with the learning speed and error limit values. How to use it? Let's try the first test: • We need to generate initial receptors set.neural networks training. You can check that all images from the training set can be recognized. Create network . Let's look at the misclassified value. with the filtered training data. In this step the initial receptors set as well as the training data will be filtered. Filter data. You can also try to use two-layered network. On application startup it's already generated. Then you can set the error limit of the second pass to "0. Let it be the regular Arial font for the first time. Then. choosing all regular fonts. which will be used for teaching network. Train network . You can even try to teach a network all fonts: regular and italic. The filtering procedure will reduce the amount of training data and neural network's input count. Then we should generate temporary training inputs using these receptors. • • • • • • • After performing all these steps we find that the concept is working! You can try a more complex test. if are not planning to change the initial amount of receptors or the filtered amount. don't forget to turn on the "Scale" option before data generation. which means that the trained network can successfully recognize all patterns from the training set. The option will scale all images from the training set. I was able to get a misclassified value of "4/260" with only 100 receptors. Select fonts. Conclusion From the above tests. At the end of training you should get a misclassified value of "0 / 130". On the basis of the data. The application also allows . for example 500. It should be "0/26". In this step the initial training data will be generated. But. We can ensure this by using the "Draw" and "Recognize" buttons. But.1" if you are not in a hurry. which tries to implement the second approach. we can filter a predefined amount of receptors. Test application A test application is provided with the article.

Possible future research can be done in the direction of better receptors set generation and filtering and image scaling. But still. . it works! And with some additional research and improvements we can try to use it for some real tasks. but we should always use the "Scale" option in this case. still the result is not very outstanding.to recognize hand drawn letters.

I shall take this to be the sigmoid function Out = 1. let us begin. if we already have a complete noise-free set of input and output vectors. We can then write Layer2In = Weight[0] . and it is convenient to call this weight[0]. if we want the system to generalize.0 + exp(-In)). Bullinaria from the School of Computer Science of The University of Birmingham. the input In into a given neuron will be the weighted sum of output activations feeding in from a number of other neurons. I shall assume that the reader is already familiar with C. namely a simple three-layer feed-forward back-propagation network (multi layer perceptron). It is convenient to think of the activations flowing through layers of neurons. i. /* start with the bias */ for( i = 1 .0 .Sigmoid) . Of course. So.e.here I shall concentrate on one particularly common and useful type. simply refer the reader to the newsgroup comp. This document contains a step by step guide to implementing a simple neural network in C. It also has the convenient property that its derivative takes the particularly simple form Sigmoid_Derivative = Sigmoid * (1. i++ ) { /* i loop over layer 1 units */ . our training set) will often do a pretty good job for new inputs as well. and our system must produce an appropriate output for each input it is given.ai. for more details about neural networks in general. or resting state. then a neural network that has learned how to map between the known inputs and outputs (i. i <= NumUnits1 . This has the effect of squashing the infinite range of In into the range 0 to 1. /* Out = Sigmoid(In) */ though other activation functions are often used (e. and. processing unit) takes it total input In and produces an output activation Out. then a simple look-up table would suffice. linear or hyperbolic tangent).g.. that is added to the sum of inputs. A single neuron (i. produce appropriate outputs for inputs it has never seen before. This type of network will be useful when we have a set of input vectors and a corresponding set of output vectors. if there are NumUnits1 neurons in layer 1. Obviously there are many types of neural network one could consider using . However.neural-nets and the associated Neural Networks FAQ.e. Typically.e.John Bullinaria's Step by Step Guide to Implementing a Neural Network in C By John A. where Weight[i] is the strength/weight of the connection between unit i in layer 1 and our unit in layer 2.. Each neuron will also have a bias. UK. the total activation flowing into our layer 2 neuron is just the sum over Layer1Out[i]*Weight[i]. It is aimed mainly at students who wish to (or have been told to) incorporate a neural network learning component into a larger system they are building. So.0/(1.

j <= NumUnits2 . i <= NumUnits1 . } Three layer networks are necessary and sufficient for most purposes. (or.0/(1. for( i = 1 . layer 1 */ } Layer2Out = 1.0/(1. declare pointers and use calloc or malloc to allocate the memory). for( i = 1 .0 + exp(-Layer2In)) . so it is appropriate to write the weights between unit i in layer 1 and unit j in layer 2 as an array Weight[i][j]. i++ ) { Layer2In[j] += Layer1Out[i] * Weight[i][j] . not one. double Layer2Out[NumUnits2+1] .0 + exp(-Layer2In[j])) . we need another loop to get all the layer 2 outputs for( j = 1 . i <= NumUnits1 . j++ ) { /* j loop computes layer 2 activations */ Layer2In[j] = Weight12[0][j] . j <= NumUnits2 . Remember that in C the array indices start from zero. i <= NumUnits1 . double Weight[NumUnits1+1][NumUnits2+1] . Thus to get the output of unit j in layer 2 we have Layer2In[j] = Weight[0][j] . } Layer2Out[j] = 1. } Layer2Out[j] = 1.0 + exp(-Layer2In[j])) . for( i = 1 . Naturally. so our layer 2 outputs feed into a third layer in the same way as above for( j = 1 . j++ ) { Layer2In[j] = Weight[0][j] .Layer2In += Layer1Out[i] * Weight[i] . double Layer2In[NumUnits2+1] . /* add in weighted contributions from /* compute sigmoid to give activation */ Normally layer 2 will have many units as well. . i++ ) { Layer2In[j] += Layer1Out[i] * Weight[i][j] . i++ ) { Layer2In[j] += Layer1Out[i] * Weight12[i][j] . more likely. so we would declare our variables as double Layer1Out[NumUnits1+1] .0/(1.

for three layer networks. j <= NumHidden . for( j = 1 . } Hidden[j] = 1. to save getting all the In's and Out's confused. j++ ) { Layer3In[k] += Layer2Out[j] * Weight23[j][k] .} Layer2Out[j] = 1. For obvious reasons. k for each layer helps. } The code can start to become confusing at this point . j <= NumUnits2 .I find that keeping a separate index i. as does an intuitive notation for distinguishing between the different layers of weights Weight12 and Weight23. i <= NumInput . for( i = 1 . j.0/(1. k <= NumUnits3 . it is traditional to call layer 1 the Input layer. layer 2 the Hidden layer. } Layer3Out[k] = 1. Our code can thus be written for( j = 1 .0 + exp(-Layer2In[j])) . j++ ) { SumH[j] = WeightIH[0][j] . we can write LayerNIn as SumN. Our network thus takes on the familiar form that we shall use for the rest of this document /* k loop computes layer 3 activations */ Also.0/(1.0 + exp(-Layer3In[k])) . i++ ) { /* j loop computes hidden unit activations */ SumH[j] += Input[i] * WeightIH[i][j] . .0/(1.0 + exp(-SumH[j])) . k++ ) { Layer3In[k] = Weight23[0][k] . and layer 3 the Output layer. } for( k = 1 .

) If we insert the above code for computing the network outputs into the p loop of this. j++ ) { /* p loop over training patterns */ /* j loop over hidden units */ SumH[p][j] = WeightIH[0][j] .0 + exp(-SumO[k])) . } Output[k] = 1. for( p = 1 . the sum squared error over all output units k and all training patterns p will be given by Error = 0.0 . for( i = 1 .e. p++ ) { for( j = 1 . Target[p][k] labelled by the index p. j <= NumHidden . p <= NumPattern . j <= NumHidden . Input[p][i] . i <= NumInput . k++ ) { Error += 0. k <= NumOutput .Output[p][k]) . for( j = 1 .5 * (Target[p][k] . p++ ) { for( k = 1 . for( p = 1 . The network learns by minimizing some measure of the error of the network's actual outputs compared with the target outputs. i++ ) { SumH[p][j] += Input[p][i] * WeightIH[i][j] .} for( k = 1 . k <= NumOutput . i. j++ ) { SumO[k] += Hidden[j] * WeightHO[j][k] . we end up with Error = 0. p <= NumPattern . pairs of input and target output vectors. .5 is conventionally included to simplify the algebra in deriving the learning algorithm. } } (The factor of 0. } Hidden[p][j] = 1.0 + exp(-SumH[p][j])) .0/(1. k++ ) { /* k loop computes output unit activations */ SumO[k] = WeightHO[0][k] . For example.0 .Output[p][k]) * (Target[p][k] .0/(1. } Generally we will have a whole set of NumPattern training patterns.

/* Sum Squared Error */ } } I'll leave the reader to dispense with any indices that they don't need for the purposes of their own system (e.5 * (Target[p][k] .0/(1. j <= NumHidden . for( j = 1 . k <= NumOutput . A standard way to do this is by 'gradient descent' on the error function. j++ ) { . p++ ) { for( j = 1 . k++ ) { /* k loop over output units */ SumO[p][k] = WeightHO[0][k] .e. j++ ) { /* repeat for all the training patterns */ /* compute hidden unit activations */ SumH[p][j] = WeightIH[0][j] . but for the above sum squared error we can compute and apply one iteration (or 'epoch') of the required weight changes DeltaWeightIH and DeltaWeightHO using Error = 0.g. i++ ) { SumH[p][j] += Input[p][i] * WeightIH[i][j] . j <= NumHidden . for( i = 1 . The literature is full of variations on this general approach .0 .0 + exp(-SumO[p][k])) . k++ ) { errors */ /* compute output unit activations and SumO[p][k] = WeightHO[0][k] .0 + exp(-SumH[p][j])) .Output[p][k]) * (Target[p][k] . for( j = 1 . Error += 0. } Hidden[p][j] = 1. j <= NumHidden .} for( k = 1 . } for( k = 1 . the indices on SumH and SumO).0/(1. The next stage is to iteratively adjust the weights to minimize the network's error.I shall begin with the 'standard on-line back-propagation with momentum' algorithm.Output[p][k]) . k <= NumOutput . This is not the place to go through all the mathematics. for( p = 1 . p <= NumPattern . } Output[p][k] = 1. We can compute how much the error is changed by a small change in each weight (i. i <= NumInput . j++ ) { SumO[p][k] += Hidden[p][j] * WeightHO[j][k] . compute the partial derivatives dError/dWeight) and shift the weights by a small amount in the direction that reduces the error.

k ++ ) { /* update weights WeightHO */ DeltaWeightHO[0][k] = eta * DeltaO[k] + alpha * DeltaWeightHO[0][k] . } Output[p][k] = 1. WeightIH[0][j] += DeltaWeightIH[0][j] . for( j = 1 . j <= NumHidden . k++ ) { SumDOW[j] += WeightHO[j][k] * DeltaO[k] . Error += 0. j <= NumHidden .Output[p] [k]) . j++ ) { DeltaWeightHO[j][k] = eta * Hidden[p][j] * DeltaO[k] + alpha * DeltaWeightHO[j][k] . for( i = 1 . i++ ) { DeltaWeightIH[i][j] = eta * Input[p][i] * DeltaH[j] + alpha * DeltaWeightIH[i][j]. for( k = 1 . WeightHO[j][k] += DeltaWeightHO[j][k] .0/(1. } } for( k = 1 . } } } . j++ ) { */ /* 'back-propagate' errors to hidden layer SumDOW[j] = 0. } DeltaH[j] = SumDOW[j] * Hidden[p][j] * (1. j <= NumHidden .0 .5 * (Target[p][k] .SumO[p][k] += Hidden[p][j] * WeightHO[j][k] .0 + exp(-SumO[p][k])) . WeightIH[i][j] += DeltaWeightIH[i][j] .Output[p][k]) . } for( j = 1 . DeltaO[k] = (Target[p][k] .0 . k <= NumOutput .Output[p][k]) * Output[p][k] * (1. WeightHO[0][k] += DeltaWeightHO[0][k] . i <= NumInput . j++ ) { /* update weights WeightIH */ DeltaWeightIH[0][j] = eta * DeltaH[j] + alpha * DeltaWeightIH[0][j] . k <= NumOutput .Output[p][k]) * (Target[p][k] .0 .Hidden[p][j]) . } for( j = 1 .

(I generally start by trying eta = 0. Finding a good value for eta will depend on the problem. the training will be unnecessarily slow.(There is clearly plenty of scope for re-ordering. and also on the value chosen for alpha.) The weight changes DeltaWeightIH and DeltaWeightHO are each made up of two components. we need to wrap the last block of code in something like for( epoch = 1 . p++ ) { ranpat[p] = p . /* set up ordered array */ .) So. with sigmoids on the outputs. It is therefore generally a good idea to use a new random order for the training patterns for each epoch. and thus smoothes out the overall weight changes. p <= NumPattern . for example the Error falls below some chosen small number.I will leave that for the reader to do once they have understood what the separate code sections are doing. but the following code will do the job for( p = 1 . } If the training patterns are presented in the same systematic order during each epoch. it is possible for weight oscillations to occur. If it is set too low. Certainly alpha must be in the range 0 to 1. p++ ) { with for( np = 1 . np <= NumPattern . np++ ) { p = ranpat[np] .1 and explore the effects of repeatedly doubling or halving it. First. p <= NumPattern . Fixing good values of the learning parameters eta and alpha is usually a matter of trial and error. and a non-zero value does usually speed up learning. combining and simplifying the loops here . Second. then it is simply a matter of replacing our training pattern loop for( p = 1 . the Error can only reach exactly zero if the weights reach infinity! Note also that sometimes the training can get stuck in a 'local minimum' of the error function and never get anywhere the actual minimum.) The complete training process will consist of repeating the above weight updates for a number of epochs (using another for loop) until some error crierion is met. Having it too large will cause the weight changes to oscillate wildly. and can slow down or even prevent learning altogether. epoch++ ) { /* ABOVE CODE FOR ONE ITERATION */ if( Error < SMALLNUMBER ) break . Generating the random array ranpat[] is not quite so simple. the alpha component that is a 'momentum' term which effectively keeps a moving average of the gradient descent weight change contributions. the eta component that is the gradient descent contribution. If we put the NumPattern training pattern indices p in random order into an array ranpat[]. epoch < LARGENUMBER . (Note that.

) to be varied during runtime . op = ranpat[p] .0. If rando() is your favourite random number generator function that returns a flat distribution of random numbers in the range 0 to 1. j++ ) { DeltaWeightHO[j][k] = 0. p <= NumPattern .0 .c (which your browser should allow you to save into your own file space).0 . i++ ) { /* initialize WeightIH and DeltaWeightIH */ DeltaWeightIH[i][j] = 0. and smallwt is the maximum absolute size of your initial weights. } } Note. and wrapped the whole lot in a main(){ }. We now have enough code to put together a working neural network program.} for( p = 1 . #defined an over simple rando(). etc. j <= NumHidden .p ) . smallwt. j++ ) { for( i = 0 . alpha and smallwt. NumHidden.0. Starting all the weights at zero is generally not a good idea. } Naturally.g. I've left plenty for the reader to do to convert this into a useful program. i <= NumInput . declared all the variables. then an appropriate section of weight initialization code would be for( j = 1 . It is normal to initialize all the weights with small random values. ranpat[p] = ranpat[np] . I have added the standard #includes. WeightIH[i][j] = 2. p++) { /* swap random elements into each position */ np = p + rando() * ( NumPattern + 1 . The file should compile and run in the normal way (e. added some print statements to show what the network is doing. k ++ ) { for( j = 0 . ranpat[np] = op . alpha. } } for( k = 1 .5 ) * smallwt . j <= NumHidden .c -O -lm -o nn' and 'nn'). using the UNIX commands 'cc nn. I have cut and pasted the above code into the file nn. hard coded the standard XOR training data and values for eta. k <= NumOutput .0 * ( rando() . one must set some initial network weights to start the learning process. WeightHO[j][k] = 2. for example: • • /* initialize WeightHO and DeltaWeightHO */ Read the training data from file Allow the parameters (eta. that it is a good idea to set all the initial DeltaWeights to zero at the same time.5 ) * smallwt . as that is often a local minimum of the error function.0 * ( rando() .

etc.0 Output[p][k] ) ) .• • • Have appropriate array sizes determined and allocate them memory during runtime Saving of weights to file. • • Separate training.Output[p][k] . DeltaO[k] = Target[p][k] .Target[p][k] ) * log( 1. output activations. rather than on-line learning Alternative activation functions (linear.Output[p][k] . tanh. DeltaO[k] = Target[p][k] . etc. and reading them back in again Plotting of errors. validation and testing sets Weight decay / Regularization . during training There are also numerous network variations that could be implemented.) Real (rather than binary) valued outputs require linear output functions Output[p][k] = SumO[p][k] . • Cross-Entropy cost function rather than Sum Squared Error Error -= ( Target[p][k] * log( Output[p][k] ) + ( 1.0 . for example: • • • Batch learning.

The powerful side of this new tool is its ability to solve problems that are very hard to be solved by traditional computing methods (e.16 KB Download demo project -257. describing how to implement a simple ANN for image recognition.52 KB Introduction Artificial Neural Networks are a recent development tool that are modeled from biological neural networks. by algorithms).g. This work briefly explains Artificial Neural Networks and their applications.Image Recognition with Neural Networks By Murat Firat | 30 Oct 2007 This article contains a brief description of BackPropagation Artificial Neural Network and its implementation for Image Recognition • • Download source -286. Background .

intrusion detection systems. fingerprint. cluster or organize data. pixels from a letter) and so does the last one "output layer" which usually holds the input's identifier (e.g. the computer cannot solve the problem. of late. The layers between input and output layers are called "hidden layer(s)" which only propagate the previous layer's outputs to the next layer and [back] propagates the following layer's error to the previous layer. BackPropagation ANNs contain one or more layers each of which are linked to the next layer. The black nodes (on the extreme left) are the initial inputs. That's why. robot control. as it is very simple to implement and effective. BackPropagation ANN is the most commonly used. The first layer is called the "input layer" which meets the initial input (e. traditional computing methods can only solve the problems that we have already understood and knew how to solve. BackPropagation and many others. In this work. ANNs have the ability to adapt. ANNs are. name of the input letter). these are the main operations of training a BackPropagation ANN which follows a few steps. Kohonen. Probably. virus detection. the inputs are propagated forward to compute the outputs for each output node. learn. About Artificial Neural Networks (ANNs) Artificial Neural Networks (ANNs) are a new approach that follow a different way from traditional computing methods to solve problems. each of these output errors is passed backward and the weights are fixed. generalize. Training such a network involves two phases. In the first phase. In the second phase. much more powerful because they can solve problems that we do not exactly know how to solve. A typical BackPropagation ANN is as depicted below.I will try to make the idea clear to the reader who is just interested in the topic. noise. Since conventional computers use algorithmic approach. we will deal with BackPropagation ANNs. Percepton. Adaline. Actually.) recognition and so on. Madaline. These two phases is continued until the sum of [square of output errors] reaches an acceptable value. That means. causing an error [an error for each output node].. Then. There are many structures of ANNs including. if the specific steps that the computer needs to follow are not known. in some way. their usage is spreading over a wide range of area including.g. pattern (image. each of these outputs are subtracted from its desired output. . However.

public double Error. public double[] Weights. The nodes of the layers are implemented as follows: Collapse | Copy Code [Serializable] struct PreInput { public double Value. }. }. public double[] Weights.Implementation The network layers in the figure above are implemented as arrays of structs. . public double Output. [Serializable] struct Input { public double InputSum. [Serializable] struct Hidden { public double InputSum.

private Output<string>[] OutputLayer. [Serializable] struct Output<T> where T : IComparable<T> { public double InputSum. or if we reach a maximum count of iterations. Repeat the process until the error reaches an acceptable value (e. }. error < 1%). Compare the resulting output with the desired output for the given input. This is called the error. The layers in the figure are implemented as follows (for a three layer network): Collapse | Copy Code private PreInput[] PreInputLayer. which means that the NN training was not successful. public double[] Weights. Training the network can be summarized as follows: • • • • • Apply input to the network.public double Output. Modify the weights for all neurons using the error. public double output.MaxError) { while(CurrentError>MaxError) { foreach(Pattern in TrainingSet) { . Calculate the output. private Input[] InputLayer. public T Value.g. public double Error. which means that the NN was trained successfully. }. private Hidden[] HiddenLayer. public double Error. public double Target. It is represented as shown below: Collapse | Copy Code void TrainNetwork(TrainingSet.

if (IterationChanged != null && currentIteration % 5 == 0) { Args. Args. } currentIteration++.CurrentIteration = currentIteration. if (IterationChanged != null) { Args.CurrentError = currentError. IterationChanged(this. } } while (currentError > maximumError && currentIteration < maximumIteration && !Args.Key).CurrentError = currentError. } . update weights } } } This is implemented as follows: Collapse | Copy Code public bool Train() { double currentError = 0. currentError += NeuralNet.Stop). NeuralNet. int currentIteration = 0.ForwardPropagate(Pattern). NeuralEventArgs Args = new NeuralEventArgs() .ForwardPropagate(p. Args).GetError(). foreach (KeyValuePair<T.CurrentIteration = currentIteration.Value. Args). IterationChanged(this. Args. do { currentError = 0. double[]> p in TrainingSet) { NeuralNet.//calculate output BackPropagate()//fix errors. p.BackPropagate().

Output * InputLayer[j].Stop) return false. Outputs. i < InputNum. for (j = 0.) and BackPropagate() methods are as shown for a three layer network: Collapse | Copy Code private void ForwardPropagate(double[] pattern. HiddenLayer[i].Value * PreInputLayer[j]. j. j++) { total += PreInputLayer[j]. InputLayer[i]. j < PreInputNum.InputSum = total. i < HiddenNum. T output) { int i.Weights[i]. } InputLayer[i]. i++) { total = 0. } //Calculate The Second(Hidden) Layer's Inputs and Outputs for (i = 0. } Where ForwardPropagate(. j < InputNum.InputSum = total.Output = F(total).if (currentIteration >= maximumIteration || Args.Output = F(total). Targets and Errors for (i = 0. j++) { total += InputLayer[j]. for (j = 0. i++) { total = 0. } //Calculate The First(Input) Layer's Inputs and Outputs for (i = 0. //Apply input to the network for (i = 0.//Training Not Successful return true.Value = pattern[i]. i < PreInputNum. i++) { PreInputLayer[i]. i < OutputNum. } HiddenLayer[i].0.Weights[i]. } //Calculate The Third(Output) Layer's Inputs. i++) .. double total.0.

} } private void BackPropagate() { int i.Weights[j] * OutputLayer[j].output = F(total).Error. i++) { for(j = 0. j < PreInputNum. j++) { total += HiddenLayer[j].0.Error.0. i < InputNum. i++) { total = 0. j++) { total += HiddenLayer[i]. OutputLayer[i].0 : 0.Output * HiddenLayer[j].OutputLayer[i].CompareTo(output) == 0 ? 1.output) * (1 OutputLayer[i]. i < InputNum. for (j = 0.Error = total.InputSum = total. j++) { total += InputLayer[i]. double total.Error = (OutputLayer[i].Error = total. } InputLayer[i]. OutputLayer[i].output).{ total = 0.Weights[i]. i++) { total = 0.Target = OutputLayer[i]. j++) . i < HiddenNum.Target . j < OutputNum. for (j = 0. } HiddenLayer[i]. //Fix Hidden Layer's Error for (i = 0. j. } OutputLayer[i]. } //Fix Input Layer's Error for (i = 0.Weights[j] * HiddenLayer[j].output) * (OutputLayer[i]. } //Update The First Layer's Weights for (i = 0. j < HiddenNum.Value. j < HiddenNum. for (j = 0.0. OutputLayer[i].0.

. i < HiddenNum. j++) { HiddenLayer[j]. This folder must be in the following format: • • There must be one (input) folder that contains input images [*. As testing the classes requires to train the network first.Error * PreInputLayer[j]. j < InputNum. i++) { for (j = 0. "PATTERNS" and "ICONS" folders [depicted below] in the Debug folder fit this format. j++) { InputLayer[j]. } } //Update The Third Layer's Weights for (i = 0. Each image's name is the target (or output) value for the network (the pixel values of the image are the inputs.{ PreInputLayer[j]. i < OutputNum. i++) { for (j = 0. there must be a folder in this format. } } //Update The Second Layer's Weights for (i = 0.Output.Output.Value. } } } Testing the App The program trains the network using bitmap images that are located in a folder. of course) .bmp].Weights[i] += LearningRate * OutputLayer[i].Error * InputLayer[j]. j < HiddenNum.Error * HiddenLayer[j].Weights[i] += LearningRate * InputLayer[i].Weights[i] += LearningRate * HiddenLayer[i].

History • • 30th September. 2007: Initial Release References & External Links • • • Principles of training multi-layer neural network using backpropagation algorithm Neural Networks by Christos Stergiou and Dimitrios Siganos An Introduction to Neural Networks. Ben Krose & Patrick van der Smagt License This article. along with any associated source code and files. is licensed under The Code Project Open License (CPOL) . 2007: Simplified the app 24th June.

- Image Processing
- REVISAR FPGA Implementation of 8, 16 and 32 Bit LFSR With Maximum Length Feedback Polynomial Using VHDL
- 02-cpp
- VHDL Language Guide
- Erlang Parte1[1] (1)
- Final Thesis Report PDF
- HTTP 1.1 - RFC7232 Conditional-Requests (2014)
- CHAPTER 01 - Basics of Coding Theory
- rfc1404
- Beyond the Basics Advanced OLAP Techniques
- Decoding of Convolutional Codes.docx
- 2-notes1
- NIIT
- C Question Bank 01
- crc
- Object Oriented Design Elements
- 04603211
- descr1
- lecture1 (2)
- nnt_intro
- Dcom
- MSVMpack
- lms_fpga_504
- 251lects
- A. Basic - 7. Variable Types
- RSA
- DCOM Univ Paper
- 8 Uses of Proc Format
- Sandhu09 Coding Theory Project

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd