Performance (Regression)

Synopsis

This operator is used for statistical performance evaluation of regression tasks and delivers a list of performance criteria values of the regression task.

Description

This operator should be used for performance evaluation of regression tasks only. Many other performance evaluation operators are also available in RapidMiner e.g. the Performance operator, Performance (Binominal Classification) operator, Performance (Classification) operator etc. The Performance (Regression) operator is used with regression tasks only. On the other hand, the Performance operator automatically determines the learning task type and calculates the most common criteria for that type. You can use the Performance (User-Based) operator if you want to write your own performance measure.

Regression is a technique used for numerical prediction and it is a statistical measure that attempts to determine the strength of the relationship between one dependent variable ( i.e. the label attribute) and a series of other changing variables known as independent variables (regular attributes). Just like Classification is used for predicting categorical labels, Regression is used for predicting a continuous value. For example, we may wish to predict the salary of university graduates with 5 years of work experience, or the potential sales of a new product given its price. Regression is often used to determine how much specific factors such as the price of a commodity, interest rates, particular industries or sectors influence the price movement of an asset. For evaluating the statistical performance of a regression model the data set should be labeled i.e. it should have an attribute with label role and an attribute with prediction role. The label attribute stores the actual observed values whereas the prediction attribute stores the values of label predicted by the regression model under discussion.

Input

labeled data

This input port expects a labeled ExampleSet. The <reference key="operator.apply_model">Apply Model</reference> operator is a good example of such operators that provide labeled data. Make sure that the ExampleSet has the label and prediction attribute. See the <reference key="operator.set_role">Set Role</reference> operator for more details regarding the label and prediction roles of attributes.

performance

This is an optional parameter. It requires a Performance Vector.

Output

performance

This port delivers a Performance Vector (we call it output-performance-vector for now). The Performance Vector is a list of performance criteria values. The Performance vector is calculated on the basis of the label and prediction attribute of the input ExampleSet. The output-performance-vector contains performance criteria calculated by this Performance operator (we call it calculated-performance-vector here). If a Performance Vector was also fed at the performance input port (we call it input-performance-vector here), the criteria of the input-performance-vector are also added in the output-performance-vector. If the input-performance-vector and the calculated-performance-vector both have the same criteria but with different values, the values of the calculated-performance-vector are delivered through the output port. This concept can be easily understood by studying the <reference key="process.performance_classification.performance_port">Example Process</reference> of the Performance (Classification) operator.

example set

The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

Main criterion

The main criterion is used for comparisons and needs to be specified only for processes where performance vectors are compared, e.g. attribute selection or other meta optimization process setups. If no main criterion is selected, the first criterion in the resulting performance vector will be assumed to be the main criterion.

Root mean squared error

The averaged root-mean-squared error.

Absolute error

The average absolute deviation of the prediction from the actual value. The values of the label attribute are the actual values.

Relative error

The average relative error is the average of the absolute deviation of the prediction from the actual value divided by actual value. Values of the label attribute are the actual values.

Relative error lenient

The average lenient relative error is the average of the absolute deviation of the prediction from the actual value divided by the maximum of the actual value and the prediction. The values of the label attribute are the actual values.

Relative error strict

The average strict relative error is the average of the absolute deviation of the prediction from the actual value divided by the minimum of the actual value and the prediction. The values of the label attribute are the actual values.

Normalized absolute error

The absolute error divided by the error made if the average would have been predicted.

Root relative squared error

The averaged root-relative-squared error.

Squared error

The averaged squared error.

Correlation

Returns the correlation coefficient between the label and prediction attributes.

Squared correlation

Returns the squared correlation coefficient between the label and prediction attributes.

Prediction average

Returns the average of all the predictions. All the predicted values are added and the sum is divided by the total number of predictions.

Spearman rho

The rank correlation between the actual and predicted labels, using Spearman's rho. Spearman's rho is a measure of the linear relationship between two variables. The two variables in this case are the label and the prediction attribute.

Kendall tau

The rank correlation between the actual and predicted labels, using Kendall's tau-b. Kendall's tau is a measure of correlation, and so measures the strength of the relationship between two variables. The two variables in this case are the label and the prediction attribute.

Skip undefined labels

If set to true, examples with undefined labels are skipped.

Comparator class

This is an expert parameter. Fully qualified classname of the PerformanceComparator implementation is specified here.

Use example weights

This parameter allows example weights to be used for statistical performance calculations if possible. This parameter has no effect if no attribute has the weight role. In order to consider weights of examples the ExampleSet should have an attribute with the weight role. Several operators are available that assign weights e.g. the Generate Weights operator. Study the <reference key="operator.set_roles">Set Roles</reference> operator for more information regarding the weight role.

Synopsis​

Description​

Input​

labeled data​

performance​

Output​

performance​

example set​

Parameters​

Main criterion​

Root mean squared error​

Absolute error​

Relative error​

Relative error lenient​

Relative error strict​

Normalized absolute error​

Root relative squared error​

Squared error​

Correlation​

Squared correlation​

Prediction average​

Spearman rho​

Kendall tau​

Skip undefined labels​

Comparator class​

Use example weights​

Synopsis

Description

Input

labeled data

performance

Output

performance

example set

Parameters

Main criterion

Root mean squared error

Absolute error

Relative error

Relative error lenient

Relative error strict

Normalized absolute error

Root relative squared error

Squared error

Correlation

Squared correlation

Prediction average

Spearman rho

Kendall tau

Skip undefined labels

Comparator class

Use example weights