Skip to main content

Generate Weight (LPR)

Synopsis

This operator uses the distance between an example's label value and the result of a local polynomial regression to determine the weight of this example.

Description

This operator performs a weighting of the examples and hence the resulting exampleset will contain a new weight attribute. If a weight attribute was already included in the exampleSet, its values will be used as initial values for this algorithm. If not, each example is assigned a weight of 1.

For calculating the weights, this operator will perform a local polynomial regression for each example. For more information about local polynomial regression, take a look at the operator description of the Local Polynomial Regression operator.

After the predicted result has been calculated, the residuals are computed and rescaled using their median.

This result will be transformed by a smooth function, which cuts of values greater than a threshold. This means, that examples without prediction error will gain a weight of 1, while examples with an error greater than the threshold will be down weighted to 0.

This procedure is iterated as often as specified by the user and will result in weights, which will penalize outliers heavily. This is especially useful for algorithms using the least squares optimization such as Linear Regression, Polynomial Regression or Local Polynomial Regression, since least square is very sensitive to outliers.

Input

example set

This is an example set input port

Output

example set

This is an example set output port

Parameters

degree

Specifies the degree of the local fitted polynomial. Please keep in mind, that a higher degree than 2 will increase calculation time extremely and probably suffer from overfitting.

ridge factor

Specifies the ridge factor. This factor is used to penalize high coefficients. In order to aviod overfitting this might be increased.

iterations

The number of iterations performed for weight calculation. See operator description for details.

numerical measure

Select measure

kernel type

The kernel type

kernel gamma

The kernel parameter gamma.

kernel sigma1

The kernel parameter sigma1.

kernel sigma2

The kernel parameter sigma2.

kernel sigma3

The kernel parameter sigma3.

kernel degree

The kernel parameter degree.

kernel shift

The kernel parameter shift.

kernel a

The kernel parameter a.

kernel b

The kernel parameter b.

neighborhood type

Determines which type of neighborhood should be used. Either with fixed number of neighbors, or all neighbors within a distance or mixed.

k

Specifies the number of neighbors in the neighborhood. Regardless of the local density, always that much samples are returned.

fixed distance

Specifies the size of the neighborhood. All points within this distance are added.

relative size

Specifies the size of the neighborhood relative to the total number of examples. A value of 0.04 would include 4% of the data points into the neighborhood.

distance

Specifies the size of the neighborhood. All points within this distance are added.

at least

If the neighborhood count is less than this number, the distance is increased until this number is met.

smoothing kernel

Determines which kernel type is used to calculate the weights of distant examples.