Deep Learning
Synopsis
Executes Deep Learning algorithm using H2O 3.30.0.1.
Description
Please note that this algorithm is deterministic only if the reproducible parameter is set to true. In this case the algorithm uses only 1 thread.
Deep Learning is based on a multi-layer feed-forward artificial neural network that is trained with stochastic gradient descent using back-propagation. The network can contain a large number of hidden layers consisting of neurons with tanh, rectifier and maxout activation functions. Advanced features such as adaptive learning rate, rate annealing, momentum training, dropout and L1 or L2 regularization enable high predictive accuracy. Each compute node trains a copy of the global model parameters on its local data with multi-threading (asynchronously), and contributes periodically to the global model via model averaging across the network.
The operator starts a 1-node local H2O cluster and runs the algorithm on it. Although it uses one node, the execution is parallel. You can set the level of parallelism by changing the Settings/Preferences/General/Number of threads setting. By default it uses the recommended number of threads for the system. Only one instance of the cluster is started and it remains running until you close RapidMiner Studio.
Input
training set
The input port expects a labeled ExampleSet.
Output
model
The Deep Learning classification or regression model is delivered from this output port. This classification or regression model can be applied on unseen data sets for prediction of the label attribute.
example set
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
weights
This port delivers the weights of the attributes with respect to the label attribute.
Parameters
Activation
The activation function (non-linearity) to be used by the neurons in the hidden layers.
- Tanh: Hyperbolic tangent function (same as scaled and shifted sigmoid).
- Rectifier: Rectifier Linear Unit: Chooses the maximum of (0, x) where x is the input value.
- Maxout: Choose the maximum coordinate of the input vector.
- ExpRectifier: Exponential Rectifier Linear Unit function. With Dropout options: Zero out a random user-given fraction of the incoming weights to each hidden layer during training, for each training row. This effectively trains exponentially many models at once, and can improve generalization. In this case the hidden_dropout_ratios parameter is used.
Hidden layer sizes
The number and size of each hidden layer in the model. For example, if a user specifies "100,200,100" a model with 3 hidden layers will be produced, and the middle hidden layer will have 200 neurons.
Hidden dropout ratios
A fraction of the inputs for each hidden layer to be omitted from training in order to improve generalization. Defaults to 0.5 for each hidden layer if omitted. Visible only if an activation function with dropout is selected.
Reproducible
Force reproducibility on small data (will be slow - only uses 1 thread).
Use local random seed
Indicates if a local random seed should be used for randomization. Available only if reproducible is set to true.
Local random seed
Local random seed for random generation. This parameter is only available if the use local random seed parameter is set to true.
Epochs
How many times the dataset should be iterated (streamed), can be fractional.
Compute variable importances
Whether to compute variable importances for input features. The implemented method considers the weights connecting the input features to the first two hidden layers.
Train samples per iteration
The number of training data rows to be processed per iteration. Note that independent of this parameter, each row is used immediately to update the model with (online) stochastic gradient descent. This parameter controls the frequency at which scoring and model cancellation can happen. Special values are 0 for one epoch per iteration, -1 for processing the maximum amount of data per iteration. Special value of -2 turns on automatic mode (auto-tuning).