You are on page 1of 7

Neural Networks Assignment

An Introduction to Back-Propagation Algorithm

Submitted by :Anish Joshi 09BIF090

An Introduction to Back-Propagation Neural Networks


Introduction This article focuses on a particular type of neural network model, known as a "feed-forward backpropagation network". This model is easy to understand, and can be easily implemented as a software simulation. What is a Neural Network? The area of Neural Networks probably belongs to the borderline between the Artifficial Intelligence and Approximation Algorythms. Think of it as of algorithms for "smart approximation". The NNs are used in (to name few) universal approximation (mapping input to the output), tools capable of learning from their environment, tools for finding non-evident dependencies between data and so on. The Neural Networking algorithms (at least some of them) are modelled after the brain (not necessarily human brain) and how it processes the information. The brain is a very efficient tool. Having about 100,000 times slover responce time than computer chips, it (so far) beats the computer in complex tasks, such as image and sound recognition, motion control and so on. It is also about 10,000,000,000 times more efficient than the computer chip in terms of energy consumption per operation. The brain is a multi layer structure (think 6-7 layers of neurons, if we are talking about human cortex) with 10^11 neurons, structure, that works as a parallel computer capable of learning from the "feedback" it receives from the world and changing its design (think of the computer hardware changing while performing the task) by growing new neural links between neurons or altering activities of existing ones. To make picture a bit more complete, let's also mention, that a typical neuron is connected to 50-100 of the other neurons, sometimes, to itself, too. To put it simple, the brain is composed of neurons, interconnected. Structure of a neuron. Our "artifficial" neuron will have inputs (all N of them) and one output: the neuron has: Set of nodes that connects it to inputs, output, or other neurons, these nodes are also called synapses. A Linear Combiner, which is a function that takes all inputs and produces a single value. A simple way of doing it is by adding together the dInput (in the case if you are not a programmer - a "d" prefix means "double", we use it so that the name (dInput) represents the floating point number) multiplied by the Synaptic Weight dWeight: for(int i = 0; i < nNumOfInputs; i++) dSum = dSum + dInput[i] * dWeight[i]; An Activation Function. We do not know what the Input will be. Consider this example - the human ear can function near the working jet engine and in the same time - if it was only ten times more sensitive, we would be able to hear a single molecule hitting the membrain in our ears! What does that mean? It means that the input should not be linear. When we go from 0.01 to 0,02, the difference should be comparable with going from 100 to 200.

How do we make a non-linear input? By applying the Activation function. It will take ANY input from minus infinity to plus infinity and squeeze it into the -1 to 1 or into 0 to 1 interval. Finally, we have a treshold. What the INTERNAL ACTIVITY of a neuron should be when there is no input? Should there be some treshold input before we have the activity? Or should the activity be present as some level (in this case it is called a bias rather than a treshold) when the input is zero? For simplicity, we (as well as the rest of the world) will replace the treshold with an EXTRA input, with weight that can change during the learning process and the input is fixed and always equal (-1). The effect, in terms of mathematical equations, is exactly the same, but the programmer has a little more breathing room A single neuron by itself is not a very useful pattern recognition tool. The real power of neural networks comes when we combine neurons into the multilayer structures, called... well... neural networks. there are 3 layers in our network (we can make it more, but if we make it less - we will have a less capable net. Making 4 layers is sometimes useful when you are looking for a non-evident things. And I have never seen a problem that requires 5 layers. For 99 percent of tasks, 3 layers is the best choice). There are N neurons in the first layer, where N equals number of inputs. There are M neurons in the output layer, where M equals number of outputs. For example, when you are building the network capable of predicting the stock price, you might want the (yesterday's) hi, lo, close, volume as inputs and close as the output. You may have any number of neurons in the inner (also called "hidden") layers. Just remember, that if you have too few, the quality of a prediction will drop and the net doesn't have enough "brains". And if you make it too many - it will have a tendency to "remember" the right answers, rather than predicting them. Then your neural net will work very well on the familiar data, but will fail on the data that was never presented before. Finding the compromice is more of an art, than science. Teaching the Neural Net. The NN receives inputs, which can be a pattern of some kind. In case of an image recognition software, for example, it would be pixels from the photo sensitive matrix of some kind, in case of a stock price prediction, it would be the "hi" (input 1), "low" (input 2) and so on. After the neuron in the first layer received its input, it applies the Linear Combiner and the Activation Function to the inputs and produces the Output. This output, as you can see from the picture, will become the input (one of them) for the neurons in the next layer. So the next layer will feed forward the data, to the next layer. And so on, until the last layer is reached. Let's use our example with the stock price. We will try to use yesterday's stock price to predict today's price. Which is the same as using today's price to predict tomorrow's price... When we work with yesterday's price, we not only know the price for the "day - 1", but also the price we are trying to predict, called the DESIRED OUTPUT of the Neural Net. When we compare the two values, we can compute the Error: dError = dDesiredOutput - dOutput; Now we can adjust this particular neuron to work better with this particular input. For example, if the dError is 10% of the dOutput, we can we can increase all synaptic weights of the neuron by 10%. The problem with this approach is that the next input will require a different adjustment.

But what if for each pattern we perform a SMALL adjustment in the right direction? To do it, we need to introduce couple of new variables. The learning rate. Say, we found that for this particular pattern, an adjustment should be 10%. Then we perform the following operation: dNewWeight = dOldWeight * dAdjustment * dLearningRate. The learning rate (dLearningRate) is a importance of a single pattern. For example, we can set it to 0.01, then it will take 100 patterns to make a 10% adjustment. Momentum is not something that we NEED, but it can speed up calculations signifficantly. Consider this: we have 100 patterns and we noticed that each moves us 0.01% towards some value. Wouldn't it be better to move faster - as long as we keep moving in the same direction? Think of the learning rate as of acceleration and think of momentum as of the speed. As the NN is learning, the errors will decrease (as the network is getting better) and we will have to adjust the learning rate and momentum. You can download the Cortex software that does it already ;) Once we decided what adjustment we need to apply to the neurons in the output layer, we can backpropagate the changes to the previous layers of the network. Indeed, as soon as we have desired outputs for the output layer, we can make adjustment to reduce the error (the difference between the output and the desired output). Adjustment will change weights of the input nodes of the neurons in the output layer. But the input nodes of the last layer are OUTPUT nodes of the previous layer! So we have the actual output of the previous layer and the desired output (after correction) - and we can adjust the previous layer of the net! And so on, until we reach the first layer. First we will discuss the basic concepts behind this type of NN, then we'll get into some of the more practical application ideas. Complex Problems The field of neural networks can be thought of as being related to artificial intelligence, machine learning, parallel processing, statistics, and other fields. The attraction of neural networks is that they are best suited to solving the problems that are the most difficult to solve by traditional computational methods. Consider an image processing task such as recognizing an everyday object projected against a background of other objects. This is a task that even a small child's brain can solve in a few tenths of a second. But building a conventional serial machine to perform as well is incredibly complex. However, that same child might NOT be capable of calculating 2+2=4, while the serial machine solves it in a few nanoseconds. A fundamental difference between the image recognition problem and the addition problem is that the former is best solved in a parallel fashion, while simple mathematics is best done serially. Neurobiologists believe that the brain is similar to a massively parallel analog computer, containing about 10^10 simple processors which each require a few milliseconds to respond to input. With neural network technology, we can use parallel processing methods to solve some real-world problems where it is very difficult to define a conventional algorithm.

The Feed-Forward Neural Network Model

If we consider the human brain to be the 'ultimate' neural network, then ideally we would like to build a device which imitates the brain's functions. However, because of limits in our technology, we must settle for a much simpler design. The obvious approach is to design a small electronic device which has a transfer function similar to a biological neuron, and then connect each neuron to many other neurons, using RLC networks to imitate the dendrites, axons, and synapses. This type of electronic model is still rather complex to implement, and we may have difficulty 'teaching' the network to do anything useful. Further constraints are needed to make the design more manageable. First, we change the connectivity between the neurons so that they are in distinct layers, such that each neruon in one layer is connected to every neuron in the next layer. Further, we define that signals flow only in one direction across the network, and we simplify the neuron and synapse design to behave as analog comparators being driven by the other neurons through simple resistors. We now have a feed-forward neural network model that may actually be practical to build and use. Referring to figures 1 and 2, the network functions as follows: Each neuron receives a signal from the neurons in the previous layer, and each of those signals is multiplied by a separate weight value. The weighted inputs are summed, and passed through a limiting function which scales the output to a fixed range of values. The output of the limiter is then broadcast to all of the neurons in the next layer. So, to use the network to solve a problem, we apply the input values to the inputs of the first layer, allow the signals to propagate through the network, and read the output values.

Figure 1. A Generalized Network. Stimulation is applied to the inputs of the first layer, and signals propagate through the middle (hidden) layer(s) to the output layer. Each link between neurons has a unique weighting value.

Figure 2. The Structure of a Neuron. Inputs from one or more previous neurons are individually weighted, then summed. The result is non-linearly scaled between 0 and +1, and the output value is passed on to the neurons in the next layer.

Since the real uniqueness or 'intelligence' of the network exists in the values of the weights between neurons, we need a method of adjusting the weights to solve a particular problem. For this type of network, the most common learning algorithm is called Back Propagation (BP). A BP network learns by example, that is, we must provide a learning set that consists of some input examples and the known-correct output for each case. So, we use these input-output examples to show the network what type of behavior is expected, and the BP algorithm allows the network to adapt. The BP learning process works in small iterative steps: one of the example cases is applied to the network, and the network produces some output based on the current state of it's synaptic weights (initially, the output will be random). This output is compared to the known-good output, and a mean-squared error signal is calculated. The error value is then propagated backwards through the network, and small changes are made to the weights in each layer. The weight changes are calculated to reduce the error signal for the case in question. The whole process is repeated for each of the example cases, then back to the first case again, and so on. The cycle is repeated until the overall error value drops below some pre-determined threshold. At this point we say that the network has learned the problem "well enough" - the network will never exactly learn the ideal function, but rather it will asymptotically approach the ideal function. When to use (or not!) a BP Neural Network Solution A back-propagation neural network is only practical in certain situations. Following are some guidelines on when you should use another approach: => Can you write down a flow chart or a formula that accurately describes the problem? If so, then stick with a traditional programming method. Is there a simple piece of hardware or software that already does what you want? If so, then the development time for a NN might not be worth it. Do you want the functionality to "evolve" in a direction that is not pre-defined? If so, then consider using a Genetic Algorithm (that's another topic!). Do you have an easy way to generate a significant number of input/output examples of the desired behavior? If not, then you won't be able to train your NN to do anything. Is the problem is very "discrete"? Can the correct answer can be found in a look-up table of reasonable size? A look-up table is much simpler and more accurate. Are precise numeric output values required? NN's are not good at giving precise numeric answers. Conversely, here are some situations where a BP NN might be a good idea: A large amount of input/output data is available, but you're not sure how to relate it to the output. The problem appears to have overwhelming complexity, but there is clearly a solution. It is easy to create a number of examples of the correct behavior. The solution to the problem may change over time, within the bounds of the given input and output parameters (i.e., today 2+2=4, but in the future we may find that 2+2=3.8). Outputs can be "fuzzy", or non-numeric. One of the most common applications of NNs is in image processing. Some examples would be: identifying hand-written characters; matching a photograph of a person's face with a different photo in a database; performing data compression on an image with minimal loss of content. Other applications could be: voice recognition; RADAR signature analysis; stock market prediction. All of these problems involve large amounts of data, and complex relationships between the different parameters. It is important to remember that with a NN solution, you do not have to understand the solution at all! This is a major advantage of NN approaches. With more traditional techniques, you must understand the inputs, and the algorithms, and the outputs in great detail, to have any hope of implementing something that works.

With a NN, you simply show it: "this is the correct output, given this input". With an adequate amount of training, the network will mimic the function that you are demonstrating. Further, with a NN, it is OK to apply some inputs that turn out to be irrelevant to the solution - during the training process, the network will learn to ignore any inputs that don't contribute to the output. Conversely, if you leave out some critical inputs, then you will find out because the network will fail to converge on a solution. If your goal is stock market prediction, you don't need to know anything about economics, you only need to acquire the input and output data (most of which can be found in the Wall Street Journal). Robotics Applications ? A fairly simple home-built robot probably doesn't have much need for a Neural Network. However, with larger-scale projects, there are many difficult problems to be solved. A robot that walks on two legs will have some sort of gyro or accelerometer system that is equivalent to the human inner-ear. This data must be processed along with the position of each part of the body, and with variations in the terrain. A robot that responds to a variety of voice commands must analyze the time, amplitude, and frequency components of what it hears; and compare it to a known vocabulary. A game-playing robot must respond to the unpredictable behavior of it's opponent. Also, it may want to "learn from experience" how to play a better game.