Remove Useless Attributes
Synopsis
This operator removes useless attributes from an ExampleSet. The thresholds for useless attributes are specified by the user.
Description
The Remove Useless Attributes operator removes four kinds of useless attributes:
- Such nominal attributes where the most frequent value is contained in more than the specified ratio of all examples. The ratio is specified by the nominal useless above parameter. This ratio is defined as the number of examples with most frequent attribute value divided by the total number of examples. This property can be used for removing such nominal attributes where one value dominates all other values.
- Such nominal attributes where the most frequent value is contained in less than the specified ratio of all examples. The ratio is specified by the nominal useless below parameter. This ratio is defined as the number of examples with most frequent attribute value divided by the total number of examples. This property can be used for removing nominal attributes with too many possible values.
- Such numerical attributes where the Standard Deviation is less than or equal to a given deviation threshold. The numerical min deviation parameter specifies the deviation threshold. The Standard Deviation is a measure of how spread out values are. Standard Deviation is the square root of the Variance which is defined as the average of the squared differences from the Mean.
- Such nominal attributes where the value of all examples is unique. This property can be used to remove id-like attributes.
Input
example set input
This input port expects an ExampleSet. It is the output of the Filter Examples operator in the attached Example Process. The output of other operators can also be used as input.
Output
example set output
The attributes that satisfy the user-defined criteria for useless attributes are removed from the ExampleSet and this ExampleSet is delivered through this output port.
original
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
Parameters
Numerical min deviation
The numerical min deviation parameter specifies the deviation threshold. Such numerical attributes where Standard Deviation is less than or equal to this deviation threshold are removed from the input ExampleSet. The Standard Deviation is a measure of how spread out values are. Standard Deviation is the square root of the Variance which is defined as the average of the squared differences from the Mean.
Nominal useless above
The nominal useless above parameter specifies the ratio of the number of examples with most frequent value to the total number of examples. Such nominal attributes where the ratio of the number of examples with most frequent value to the total number of examples is more than this ratio are removed from the input ExampleSet. This property can be used to remove such nominal attributes where one value dominates all other values.
Nominal remove id like
If this parameter is set to true, all such nominal attributes where the value of all examples is unique are removed from the input ExampleSet. This property can be used to remove id-like attributes.
Nominal useless below
The nominal useless below parameter specifies the ratio of the number of examples with most frequent value to the total number of examples. Such nominal attributes where the ratio of the number of examples with most frequent value to the total number of examples is less than this ratio are removed from the input ExampleSet. This property can be used to remove nominal attributes with too many possible values.