Skip to main content

Remove Useless Attributes

Synopsis

This operator removes useless attributes from an ExampleSet. The thresholds for useless attributes are specified by the user.

Description

The Remove Useless Attributes operator removes four kinds of useless attributes:

  1. Such nominal attributes where the most frequent value is contained in more than the specified ratio of all examples. The ratio is specified by the nominal useless above parameter. This ratio is defined as the number of examples with most frequent attribute value divided by the total number of examples. This property can be used for removing such nominal attributes where one value dominates all other values.
  2. Such nominal attributes where the most frequent value is contained in less than the specified ratio of all examples. The ratio is specified by the nominal useless below parameter. This ratio is defined as the number of examples with most frequent attribute value divided by the total number of examples. This property can be used for removing nominal attributes with too many possible values.
  3. Such numerical attributes where the Standard Deviation is less than or equal to a given deviation threshold. The numerical min deviation parameter specifies the deviation threshold. The Standard Deviation is a measure of how spread out values are. Standard Deviation is the square root of the Variance which is defined as the average of the squared differences from the Mean.
  4. Such nominal attributes where the value of all examples is unique. This property can be used to remove id-like attributes.

Input

example set input

This input port expects an ExampleSet. It is the output of the Filter Examples operator in the attached Example Process. The output of other operators can also be used as input.

Output

example set output

The attributes that satisfy the user-defined criteria for useless attributes are removed from the ExampleSet and this ExampleSet is delivered through this output port.

original

The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

Numerical min deviation

The numerical min deviation parameter specifies the deviation threshold. Such numerical attributes where Standard Deviation is less than or equal to this deviation threshold are removed from the input ExampleSet. The Standard Deviation is a measure of how spread out values are. Standard Deviation is the square root of the Variance which is defined as the average of the squared differences from the Mean.

Nominal useless above

The nominal useless above parameter specifies the ratio of the number of examples with most frequent value to the total number of examples. Such nominal attributes where the ratio of the number of examples with most frequent value to the total number of examples is more than this ratio are removed from the input ExampleSet. This property can be used to remove such nominal attributes where one value dominates all other values.

Nominal remove id like

If this parameter is set to true, all such nominal attributes where the value of all examples is unique are removed from the input ExampleSet. This property can be used to remove id-like attributes.

Nominal useless below

The nominal useless below parameter specifies the ratio of the number of examples with most frequent value to the total number of examples. Such nominal attributes where the ratio of the number of examples with most frequent value to the total number of examples is less than this ratio are removed from the input ExampleSet. This property can be used to remove nominal attributes with too many possible values.