You are on page 1of 7

Neural Net

Synopsis Learns a neural net from the input data.

Description This operator learns a model by means of a feed-forward neural network trained by a backpropagation algorithm (multi-layer perceptron). The user can define the structure of the neural network with the parameter list "hidden_layers". Each list entry describes a new hidden layer. The key of each entry corresponds to the layer name, which can be chosen arbitrarily. It is only used for displaying the model. The value of each entry must be a number defining the size of the hidden layer. Note that the actual number of nodes will be one more than the value specified. An additional constant node will be added to each layer. This node will not be connected to the preceding layer. A size value of -1 indicates that the layer size should be calculated from the number of attributes of the input example set. In this case, the layer size will be set to (number of attributes + number of classes) / 2 + 1. If the user does not specify any hidden layers, a default hidden layer with sigmoid type and size (number of attributes + number of classes) / 2 + 1 will be created and added to the net. If only a single layer without nodes is specified, the input nodes are directly connected to the output nodes and no hidden layer will be used. The used activation function is the usual sigmoid function. Therefore, the values ranges of the attributes should be scaled to -1 and +1. This is also done by this operator if not specified otherwise by the corresponding parameter setting. The type of the output node is sigmoid if the learning data describes a classification task and linear for numerical regression tasks. Input

training set: expects: ExampleSet

Output

model: exampleSet:

Parameters

hidden layers: Describes the name and the size of all hidden layers. Range: list training cycles: The number of training cycles used for the neural network training. Range: integer; 1-+?; default: 500

learning rate: The learning rate determines by how much we change the weights at each step. May not be 0. Range: real; 4.9E-324-1.0 momentum: The momentum simply adds a fraction of the previous weight update to the current one (prevent local maxima and smoothes optimization directions). Range: real; 0.0-1.0 decay: Indicates if the learning rate should be decreased during learningh Range: boolean; default: false shuffle: Indicates if the input data should be shuffled before learning (increases memory usage but is recommended if data is sorted before) Range: boolean; default: true normalize: Indicates if the input data should be normalized between -1 and +1 before learning (increases runtime but is in most cases necessary) Range: boolean; default: true error epsilon: The optimization is stopped if the training error gets below this epsilon value. Range: real; 0.0-+? use local random seed: Indicates if a local random seed should be used. Range: boolean; default: false local random seed: Specifies the local random seed Range: integer; 1-+?; default: 1992

X-Validation
Synopsis X-Validation encapsulates a cross-validation in order to estimate the performance of a learning operator. Description
X-Validation

performs a cross-validation process. The input ExampleSet S is split up into number of validations subsets S_i. The inner subprocesses are applied number of validations times using S_i as the test set (input of the Testing subprocess) and S \ S_i as training set (input of the Training subprocess). The Training subprocess must return a model, which is usually trained on the input ExampleSet. The Testing subprocess must return a PerformanceVector. This is usually generated by applying the model and measuring it's performance. Additional objects might be passed from the Training to the Testing subprocess using the through ports. Please note that the performance calculated by this estimation scheme is only an estimation of the performance which would be achieved with the model built on the complete delivered data set instead of an exact calculation. Exactly this model, hence the one built on the complete input data, is delivered at the corresponding port in order to give convenient access to this model. Like other validation schemes the RapidMiner cross validation can use several types of sampling for building the subsets Linear sampling simply divides the example set into partitions without changing the order of the examples. Shuffled sampling build random subsets from the data. Stratifed sampling builds

random subsets and ensures that the class distribution in the subsets is the same as in the whole example set. For having the random splits independent from the rest of the process, a local random seed might be used. See the parameters for details. The cross validation operator provides several values which can be logged by means of a Log </p> <p> . Of course the number of the current iteration can be logged which might be useful for ProcessLog operators wrapped inside a cross validation. Beside that, all performance estimation operators of RapidMiner provide access to the average values calculated during the estimation. Since the operator cannot ensure the names of the delivered criteria, the ProcessLog operator can access the values via the generic value names:

performance: the value for the main criterion calculated by this validation operator performance1: the value of the first criterion of the performance vector calculated performance2: the value of the second criterion of the performance vector calculated performance3: the value of the third criterion of the performance vector calculated for the main criterion, also the variance and the standard deviation can be accessed where applicable.

Input

training: expects: ExampleSet

Output

model: training: averagable 1: averagable 2:

Parameters

create complete model: Indicates if a model of the complete data set should be additionally build after estimation. Range: boolean; default: false average performances only: Indicates if only performance vectors should be averaged or all types of averagable result vectors Range: boolean; default: true leave one out: Set the number of validations to the number of examples. If set to true, number_of_validations is ignored Range: boolean; default: false number of validations: Number of subsets for the crossvalidation. Range: integer; 2-+?; default: 10 sampling type: Defines the sampling type of the cross validation (linear = consecutive subsets, shuffled = random subsets, stratified = random subsets with class distribution kept constant) Range: linear sampling, shuffled sampling, stratified sampling; default: stratified sampling

use local random seed: Indicates if a local random seed should be used. Range: boolean; default: false local random seed: Specifies the local random seed Range: integer; 1-+?; default: 1992

Performance
Synopsis This operator delivers a list of performance values automatically determined in order to fit the learning task type. Description In contrast to the other performance evaluation methods like for example Performance (Classification), Performance (Binominal Classification) or Performance (Regression), this operator can be used for all types of learning tasks. It will automatically determine the learning task type and will calculate the most common criteria for this type. For more sophisticated performance calculations, you should use the operators mentioned above. If none of them suits your need, you might write your own performance measure and calculate it with Performance (User-Based). This operator expects a test ExampleSet as input, containing one attribute with the role label and one with the role prediction. See the Set Role operator for more details. On the basis of this two attributes a PerformanceVector is calculated, containing the values of the performance criteria. If a PerformanceVector was fed into performance input, it's values are kept if it does not already contain the new criteria. Otherwise the values are averaged over the old and the new values. The following criterias are added for binominal classification tasks:

Accuracy Precision Recall AUC (optimistic) AUC (neutral) AUC (pessimistic)

The following criterias are added for polynominal classification tasks:


Accuracy Kappa statistic

The following criterias are added for regression tasks:


Root Mean Squared Error Mean Squared Error

Parameters

use example weights: Indicated if example weights should be used for performance calculations if possible. Range: boolean; default: true

Series:Moving Average
Synopsis Generates a new attribute containing the moving average series of a series attribute. Description Creates a new series attribute which contains the moving average of a series. The calculation of a series moving average uses a window of a fixed size that is moved over the series data. At any position, the values that lie in the window are aggregated according a specified function. The aggregated value forms the moving average value which is put into the result series. Input

example set input: expects: ExampleSetMetaData: #examples: = 0; #attributes: 0

, expects: ExampleSet Output


example set output: original:

Parameters

attribute name: The name of the original series attribute from which the moving average series should be calculated. Range: string window width: The width of the moving average window. Range: integer; 2-+?; default: 5 aggregation function: The aggregation function that is used for aggregating the values in the window. Range: average, variance, standard_deviation, count, minimum, maximum, sum, mode, median, product; default: average ignore missings: Ignore missing values in the aggregation of window values. Range: boolean; default: false result position: Defines where in the window the result should be placed in the moving average series. Range: start, center, end; default: end window weighting: The window function that should be used to weight the values in the window for the aggregation. Range: Rectangular, Triangular, Gaussian, Hann, Hamming, Blackman, Blackman-Harris, Bartlett; default: Rectangular

keep original attribute: Indicates whether the original attribute should be kept in the data set. Range: boolean; default: true

Series:Fit Trend
Synopsis Adds a trend for an attribute which is generated by an inner regression learner. Description Adds a trend line for a specified series attributes by regressing a dummy variable onto the series attribute. The trend line can have an arbitrary shape which is defined by the inner learner used in this operator. For a linear trend line add a linear regression learner as inner operator. For nonlinear trend lines add e.g. an SVM learner with a polynomial kernel or a Radial Basis Function kernel. Input

example set: expects: ExampleSet, expects: ExampleSet

Output

example set with trend:

Parameters

attribute: The attribute for which the trend should be added. Range: string keep original attribute: Indicates whether the original attribute should be kept in the data set. Range: boolean; default: true

NEURAL ARCHITECTURE The unit of a neural network is modeled on the biological neuron The unit combines its inputs into a single value, which it then transforms to produce the output; together these are called the activation function Output =Weighted sum +Bias Activation function = combination function + transfer function TRAINING Training is the process of setting the best weights on the edges connecting all the units in the network

The goal is to use the training set to calculate weights where the output of the network is as close to the desired output as possible for as many of the examples in the training set as possible Back propagation has been used since the 1980s to adjust the weights (other methods are now available): Calculates the error by taking the difference between the calculated result and the actual result The error is fed back through the network and the weights are adjusted to minimize the error

BENEFITS Generalization and Robustness - Powerful technique to solve many real world problems, due to the ability to learn from experience in order to improve their performance and to adapt themselves to changes in the environment. Flexibility -Able to deal with incomplete information or noisy data and can be very effective especially in situations where it is not possible to define the rules or steps that lead to the solution of a problem. Parallelism - Neural networks normally have great potential for parallelism, since the computations of the components are independent of each other. No assumptions about the model have to be made

LIMITATIONS Neural Nets are good for prediction and estimation when: Inputs are well understood Output is well understood Experience is available for examples to use to train the neural net application (expert system) Neural Nets are only as good as the training set used to generate it. The resulting model is static and must be updated with more recent examples and retraining for it to stay relevant Overtraining Overtraining occurs when the system memorizes patterns and thus looses the ability to generalize. It is an important factor in these prediction systems as their primary use is to predict (or generalize) on input data that it has never seen. Blackbox Training a neural network results in internal weights distributed throughout the network making it difficult to understand why a solution is valid

You might also like