Skip to main content

Add Recurrent Layer

Synopsis

Add a recurrent layer to your neural net structure.

Description

This operator can be placed into the subprocess of the Deep Learning, Deep Learning (Tensor) or Autoencoder operator. It adds a functionality of recurrent layer to your neural net structure.

Recurrent neural networks (Rnn) are designed to recognize patterns in sequences of data. These sequence data come in many forms and are best described by the types of inputs and outputs supported. such as one to many relations, where the input might be the fixed size like an image but the output can be series of sequence of variable length such as a caption. Rnn can be used for many to one output e.g. input in form of piece of text and output in form of sentiment as positive or negative. Rnn can also be used to have many to many relations as machine translation where the input is some sentence in English with variable length and output can be some sentence in French with different variable length. So, we need such models which accept both variable length sequences on the input and the output. Rnn are also very useful for some problems that have fixed size input and fixed size output. In general, a recurrent layer consist of multiple neurons. Each neuron contains one recurrent core cell. For each example the individual parts of sequence are fed in one-by-one changing the hidden state each time. Internally, the recurrent core cell block computes a recurrence function, which will accept the previous hidden state and the input at the current state and output the updated hidden state. The updated hidden state goes into same function and produces the output at the end of training. The function and weights remains the same throughout the computation.

Input

layerArchitecture

A network configuration setup with previous operators. Connect this port to the layerArchitecture output port of another add layer operator or to the layer port of the "Deep Learning" operator if this layer is the first one.

Output

layerArchitecture

The network with the configuration for this fully-connected layer added. Connect this port to the next input port of another layer or the layer port on the right side of the "Deep Learning" operator.

Parameters

Neurons

Provide the number of neurons used in this layer. A neuron can be seen as a new attribute that takes into account information from all neurons of the previous layer. Each recurrent neuron has two sets of weights: the current weight of a neuron with the input value as well as its own output from the previous time step and adding a bias value. Afterwards the so-called activation function is applied to check whether an input should be taken into account or not.

As the output of recurrent neuron at time step(t) is a function of all the inputs from previous time steps, One can say it has a form of memory. A part of such neural network that preserves some states across time steps are referred as "Memory Cell". Single layer of recurrent neurons will have a very basic cell than more complex and powerful types of cell.

Activation function

Activation functions allow networks to create complex nonlinear decision boundaries. Mathematically speaking the chosen activation function is wrapped around the result of multiplying weights to input data and adding the bias. Hence activation functions ensure that a layers output is within a given range and a general decision whether to use the output or not can be made. The default activation for SimpleRNN is tanh.

Because these none linear functions increase the computational load during training, choosing a simple function (with a monotonic derivative) is recommended for many situations.

Choosing the activation function for the last layer of a network is slightly different from previous layers. At this point the activation functions provides a conversion from the internal network state to the awaited output. For regression tasks "None (identity)" might be chosen, while for classification problems "Softmax" converts the results to probabilities for the given class values.

  • ReLU (Rectified Linear Unit): Rectified linear unit. Activation function is max(0, x). Monotonic derivative.
  • Sigmoid: Sigmoid or logistic function. None monotonic derivative. Sensitive to small changes in the present data. Results are in the range between 0 and 1.
  • Softmax: Softmax or normalized exponential function. Resulting values are in a range between 0 and 1, while adding up to one. Hence this function can be used to map values to probability like values.
  • TanH: TanH function, similar to the sigmoid function. None monotonic derivative with values in the range -1 and +1.
  • Cube: Cubic function. Output is the cubic of input values. https://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf
  • ELU (Exponential Linear Unit): Same as ReLU for values above zero, but an exponential function below. Hence the derivative is only monotonic for values above zero.
  • GELU (Gaussian Error Linear Unit): Gaussian Error Linear Unit. Activation function is x * Phi(x), with Phi(x) as the standard Gaussian cumulative distribution function. Difference to ReLU: input is weighted based on its value instead of its sign. https://arxiv.org/abs/1606.08415 Sigmoid version of the implementation is used.
  • MISH: A self-regularized non-monotonic activation function. Activation function is x tanh (ln(1 + exp(x))). https://arxiv.org/abs/1908.08681 Sigmoid version of the implementation is used.
  • Leaky ReLU: Same as ReLU for values above zero, but with a linear function for values below. Monotonic derivative.
  • Rational TanH: Rational TanH approximation, element-wise function.
  • Randomized ReLU: Similar to ReLU but with a randomly chosen scaling factor for the linearity. Monotonic derivative.
  • Rectified TanH: Similar to ReLU, but with a TanH function for positive values instead of a linearity. None monotonic derivative.
  • Softplus: A logarithmic function with values ranging vom zero to infinity. Monotonic derivative.
  • Softsign: Similar to TanH with same range and monotonicity but less prone to changes.
  • SELU (Scaled ELU): Scaled exponential linear unit. Similar to ELU, but with a scaling factor. None monotonic derivative. https://arxiv.org/pdf/1706.02515.pdf
  • None (identity): Output equals input. This function can be used e.g. within the last layer of a network to obtain a regression result. Monotonic derivative.

Layer name

Provide a name for the layer for ease of identification, when inspecting the sequentialModel or re-using it.