Generate Weight (LPR)
Synopsis
This operator uses the distance between an example's label value and the result of a local polynomial regression to determine the weight of this example.
Description
This operator performs a weighting of the examples and hence the resulting exampleset will contain a new weight attribute. If a weight attribute was already included in the exampleSet, its values will be used as initial values for this algorithm. If not, each example is assigned a weight of 1.
For calculating the weights, this operator will perform a local polynomial regression for each example. For more information about local polynomial regression, take a look at the operator description of the Local Polynomial Regression operator.
After the predicted result has been calculated, the residuals are computed and rescaled using their median.
This result will be transformed by a smooth function, which cuts of values greater than a threshold. This means, that examples without prediction error will gain a weight of 1, while examples with an error greater than the threshold will be down weighted to 0.
This procedure is iterated as often as specified by the user and will result in weights, which will penalize outliers heavily. This is especially useful for algorithms using the least squares optimization such as Linear Regression, Polynomial Regression or Local Polynomial Regression, since least square is very sensitive to outliers.
Input
example set
This is an example set input port
Output
example set
This is an example set output port
Parameters
degree
Specifies the degree of the local fitted polynomial. Please keep in mind, that a higher degree than 2 will increase calculation time extremely and probably suffer from overfitting.
ridge factor
Specifies the ridge factor. This factor is used to penalize high coefficients. In order to aviod overfitting this might be increased.
iterations
The number of iterations performed for weight calculation. See operator description for details.
numerical measure
Select measure
kernel type
The kernel type
kernel gamma
The kernel parameter gamma.
kernel sigma1
The kernel parameter sigma1.
kernel sigma2
The kernel parameter sigma2.
kernel sigma3
The kernel parameter sigma3.
kernel degree
The kernel parameter degree.
kernel shift
The kernel parameter shift.
kernel a
The kernel parameter a.
kernel b
The kernel parameter b.
neighborhood type
Determines which type of neighborhood should be used. Either with fixed number of neighbors, or all neighbors within a distance or mixed.
k
Specifies the number of neighbors in the neighborhood. Regardless of the local density, always that much samples are returned.
fixed distance
Specifies the size of the neighborhood. All points within this distance are added.
relative size
Specifies the size of the neighborhood relative to the total number of examples. A value of 0.04 would include 4% of the data points into the neighborhood.
distance
Specifies the size of the neighborhood. All points within this distance are added.
at least
If the neighborhood count is less than this number, the distance is increased until this number is met.
smoothing kernel
Determines which kernel type is used to calculate the weights of distant examples.