Sliding Window Validation

Synopsis

This operator performs a sliding window validation for a machine learning model trained on time dependent input data.

Description

The operator creates sliding windows from the input data. In each validation step the training window is provided at the inner training set port of the Training subprocess. The size of the training window is defined by the parameter training window size. The training window can be used to train a machine learning model which has to be provided to the model port of the Training subprocess.

The test window of the input data is provided at the inner test set port of the Testing subprocess. Its size is defined by the parameter test window size. The model trained in the Training subprocess is provided at the model port of the Testing subprocess. It can be applied on the test set. The performance of this prediction can be evaluated and the performance vector has to be provided to the performance port of the Testing process. For the next validation fold, the training and the test windows are shifted by k values, defined by the parameter step size.

The described behavior is the default example based windowing. It can be changed to time based windowing or custom windowing by changing the unit parameter. For time based windowing, the windowing parameter are specified in time durations/periods. For the "custom" windowing an additional ExampleSet has to be provided to the new "custom windows" input port. It holds the start (and optional the stop values) of the windows. For more details see the unit parameter and the description of the corresponding parameters.

Expert settings (for example no overlapping windows, the empty window handling, ..) can be enabled by selecting the corresponding expert settings parameter.

The sliding window validation ensures that the machine learning model built in the Training subprocess is always evaluated on Examples which are after the training window.

If the model output port of the Sliding Window Validation operator is connected, a final window with the same size as the training windows, but ending at the last example of the input series is used to train a final model. This final model is provided at the model output port.

This operator works on all time series (numerical, nominal and time series with date time values).

Input

example set

This input port receives an ExampleSet to apply the sliding window validation.

custom windows

The example set which contains the start (and stop) values of the custom windows. Only needs to be connected if the parameter unit is set to custom.

Output

model

example set

The ExampleSet that was given as input is passed through without changes.

test result set

All test set ExampleSets, appended to one ExampleSet.

performance

This is an expandable port. You can connect any performance vector (result of a Performance operator) to the result port of the inner Testing subprocess. The performance output port delivers the average of the performances over all folds of the validation

Parameters

Has indices

This parameter indicates if there is an index attribute associated with the time series. If this parameter is set to true, the index attribute has to be selected.

Indices attribute

If the parameter has indices is set to true, this parameter defines the associated index attribute. It can be either a date, date_time or numeric value type attribute. The attribute name can be selected from the drop down box of the parameter if the meta data is known.

Sort time series

If this parameter is selected, the input time series will be sorted, according to the selected indices attribute, before the time series operation is applied on. If it is not selected and the input time series is not sorted, a corresponding User Error is thrown.

Keep in mind that the indices values still needs to be unique. If the values are non-unique a corresponding User Error is thrown.

Expert settings

This parameter can be selected to show expert settings for a more detailed configuration of the operator. The expert settings are: windows defined, custom start point, custom end point, date format, no overlapping windows, and empty window handling.

Unit

The mode on how windows are defined. It defines the unit of the window parameters (training window size, step size, test window size and test window offset).

example based: The window parameters are specified in number of examples. This is the default option.
time based: The window parameter are specified in time durations/periods (units ranging from milliseconds to years).
custom: An additional example set has to be provided to the new "custom windows" input port. It holds the start (and optional the stop values) of the windows.

Windows defined

This parameter defines the point from which the windows are defined of. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

from start: The first window will start at the first example of the input data set. The following windows are set up according to the window parameters.
from end: The last window will end at the last example of the input data set. The previous windows are set up according to the window parameters.
custom start: The first window will start at the custom start point provided by the parameter custom start point / custom start time. The following windows are set up according to the window parameters.
custom end: The last window will end at the custom end point provided by the parameter custom end point / custom end time. The previous windows are set up according to the window parameters.

Custom start point

If the parameter windows defined is set to custom start and the unit is set to example based, this parameter defines the custom point from which the windows start. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

Custom end point

If the parameter windows defined is set to custom end and the unit is set to example based, this parameter defines the custom point where the windows end. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

Custom start time

If the parameter windows defined is set to custom start and the unit is set to time based, this parameter defines the custom date time point from which the windows start.

The date time format used to interpret the string provided in this parameter is defined by the parameter date format. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

Custom end time

If the parameter windows defined is set to custom end and the unit is set to time based, this parameter defines the custom date time point where the windows end.

Date format

Date format used for the custom start time and custom end time parameters. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

Training window size

The number of values in the training window. The ExampleSet provided at the training set port of the Training subprocess will have training window size number of examples. The training window size has to be smaller or equal to the length of the time series.

Training window size time

The time duration/period of the training window.

The example set provided at the training set port of the Training subprocess will have all examples which are in the corresponding training window.

The training window size time has to be smaller or equal to the time duration of the time series.

No overlapping windows

If this parameter is set to true, the parameter stepsize is determined automatically, so that all training and test windows don't overlap. The stepsize is set to training window size + test window size. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

Step size

The step size between the first values of two consecutive windows. E.g. with a training window size of 10 and a step size of 2, the first training window has the values from 0, ..., 9, the second training window the values from 2, ..., 11 and so on. If no overlapping windows is set to true the step size is automatically determined depending on training window size and test window size.

Step size time

The step size (in units of time) between the start points of two consecutive windows. E.g. with a training window size of 1 week and a step size of 2 days, the first training window has the days from 0, ..., 6, the second training window the days from 2, ..., 8 and so on. If no overlapping windows is set to true the step size time is automatically determined depending on training window size time, test window size time and test window offset time.

Test window size

The number of values in the test window. The ExampleSet provided at the test set port of the Testing subprocess will have test window size number of examples. The test window size has to be smaller or equal to the length of the time series.

Test window size time

The time duration/period taken in the test window.

The ExampleSet provided at the test set port of the Testing subprocess will have the examples in the corresponding test windows. It will have an attribute holding the original time series values in the test window (attribute name is the name of the time series attribute parameter), and an attribute holding the values in the test window, forecasted by the forecast model from the Training subprocess (attribute name is forecast of <time series attribute>). In addition, the ExampleSet has an attribute with the forecast position, ranging from 1 to maximum number of test values. If the parameter has indices is set to true the ExampleSet has also an attribute holding the last index value of the training window.

Windows stop definition

Defines if the end of the custom windows are either defined by the start of the next window (windows are spanning over the whole index range) or from an additional attribute.

from next window start: The end of the windows are defined by the start of the next window (windows are spanning over the whole index range) Training windows end at the start of the next test window. Test windows end at the start of the next training window. Be aware that the last value of the start definition values (the last value of the test window start attribute) is only used as the end of the final window.
from attribute: The end of the windows are defined by additional attribute(s) in the custom window example set. The attribute names have to be provided by the parameters training window stop attribute and test window stop attribute.

Training window start attribute

This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the start values for the custom training windows.

The training window start attribute, training window stop attribute, test window start attribute and test window stop attribute have to be of the same data type. If the data type is integer, the windowing is example based (see parameter unit) otherwise the attributes needs to be the same data type as the indices attribute.

Training window stop attribute

This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the end values for the custom training windows.

Test window start attribute

This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the start values for the custom test windows.

Test window stop attribute

This parameter defines the attribute in the custom window example set (the example set provided at the custom windows input port) which contains the stop values for the custom test windows.

Empty window handling

This parameter defines how empty windows (windows which do not contain an Example) will be handled. It is an expert setting and hence it is only shown if the parameter expert settings is selected.

add empty exampleset: Empty windows will be added as an empty ExampleSet, or a row with missing values.
skip: Empty windows will be skipped completely in the processing. If either the training or the test window is empty, the processing for both windows is skipped.
fail: A user error is thrown, if an empty window occurs.

Enable parallel execution

This parameter enables the parallel execution of the inner processes. Please disable the parallel execution if you run into memory problems.

Synopsis​

Description​

Input​

example set​

custom windows​

Output​

model​

example set​

test result set​

performance​

Parameters​

Has indices​

Indices attribute​

Sort time series​

Expert settings​

Unit​

Windows defined​

Custom start point​

Custom end point​

Custom start time​

Custom end time​

Date format​

Training window size​

Training window size time​

No overlapping windows​

Step size​

Step size time​

Test window size​

Test window size time​

Windows stop definition​

Training window start attribute​

Training window stop attribute​

Test window start attribute​

Test window stop attribute​

Empty window handling​

Enable parallel execution​

Synopsis

Description

Input

example set

custom windows

Output

model

example set

test result set

performance

Parameters

Has indices

Indices attribute

Sort time series

Expert settings

Unit

Windows defined

Custom start point

Custom end point

Custom start time

Custom end time

Date format

Training window size

Training window size time

No overlapping windows

Step size

Step size time

Test window size

Test window size time

Windows stop definition

Training window start attribute

Training window stop attribute

Test window start attribute

Test window stop attribute

Empty window handling

Enable parallel execution