You are on page 1of 2

The single-layer perceptron network is the most straightforward kind of neural network.

The single-
layer perceptron network consists of a single layer, which has multiple output nodes. The input data
is sent through a series of weights before reaching the output nodes. These output nodes calculate
the sum of the product of the weights and inputs. If the estimated sum value is above some
threshold value (most probably 0), then the neuron gets fired and takes the activated value (most
likely 1). If the calculated sum value is below the threshold value (most probably 0), then the neuron
receives the deactivated value (most probably -1). Neurons with this sort of activation function are
called artificial neurons or linear threshold units. The term perceptron refers to networks consisting
of one of these units. In the 1940s, a similar kind of neuron was explained by Warren McCulloch and
Walter Pitts. A perceptron can be generated using any activated and deactivated values as long until
the threshold value lies between these two values. Perceptrons are trained by a simple learning
algorithm called the delta rule. The delta rule calculates the errors between the calculated output
and the sample output data.

The calculated error is utilized to create the weight adjustment. This adjustment is implemented
using the gradient descent algorithm. But, the Single-layer perceptrons are only capable of learning
patterns, which are linearly separable. In 1969, in a study titled Perceptrons, Marvin Minsky and
Seymour Papert revealed that, for a single-layer perceptron network, it was impossible to learn an
XOR function. Multi-Layer Perceptrons consists of many computational units in multiple layers.
These computational units are interconnected in a feed-forward way. Each neuron in one layer are
connected to the neurons of the next layer. These neurons apply a sigmoid function as an activation
function in a network for many applications.

According to the universal approximation theorem for neural networks, every continuous function
that maps a range of real numbers to some range of real numbers at the output can be
approximated randomly closely using a multi-layer perceptron with one hidden layer. This theorem
is applicable for many activation functions like the sigmoidal functions, etc.

The back-propagation algorithm is the most popular learning algorithm for multi-layer feed-forward
neural networks. The back-propagation algorithm computes a pre-defined error-function by
comparing the output values and the correct answer. Then, using many techniques, the error is fed
back to the network. Then, the back-propagation algorithm alters the weights of each connection to
decrease the value of the error function by a small value. This process is repeated for a sufficiently
vast number of training cycles. The network will usually converge to some state where the error of
the calculations is small. The network has finally learned a specific target function.

A general method for non-linear optimization called gradient descent is applied to alter the weights
appropriately.

For this, the network calculates the derivative of the error function with respect to the network
weights, and changes the weights such that the error decreases (thus going downhill on the surface
of the error function).

For this reason, back-propagation can only be applied on networks with differentiable activation
functions.

In general, the problem of teaching a network to perform well, even on samples that were not used
as training samples, is a quite subtle issue that requires additional techniques.

This is especially important for cases where only very limited numbers of training samples are
available.
The danger is that the network overfits the training data and fails to capture the true statistical
process generating the data.

Computational learning theory is concerned with training classifiers on a limited amount of data. In
the context of neural networks a simple heuristic, called early stopping, often ensures that the
network will generalize well to examples not in the training set.

You might also like