Weight by Information Gain
Synopsis
This operator calculates the relevance of the attributes based on information gain and assigns weights to them accordingly.
Description
The Weight by Information Gain operator calculates the weight of attributes with respect to the class attribute by using the information gain. The higher the weight of an attribute, the more relevant it is considered. Please note that this operator can be applied only on ExampleSets with nominal label.
Although information gain is usually a good measure for deciding the relevance of an attribute, it is not perfect. A notable problem occurs when information gain is applied to attributes that can take on a large number of distinct values. For example, suppose some data that describes the customers of a business. When information gain is used to decide which of the attributes are the most relevant, the customer's credit card number may have high information gain. This attribute has a high information gain, because it uniquely identifies each customer, but we may not want to assign high weights to such attributes.
Information gain ratio is sometimes used instead. This method biases against considering attributes with a large number of distinct values. However, attributes with very low information values then appear to receive an unfair advantage. The Weight by Information Gain Ratio operator uses information gain ratio for generating attribute weights.
Input
example set
This input port expects an ExampleSet. It is output of the Retrieve operator in the attached Example Process.
Output
weights
This port delivers the weights of the attributes with respect to the label attribute. The attributes with higher weight are considered more relevant.
example set
The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.
Parameters
Normalize weights
This parameter indicates if the calculated weights should be normalized or not. If set to true, all weights are normalized in range from 0 to 1.
Sort weights
This parameter indicates if the attributes should be sorted according to their weights in the results. If this parameter is set to true, the order of the sorting is specified using the sort direction parameter.
Sort direction
This parameter is only available when the sort weights parameter is set to true. This parameter specifies the sorting order of the attributes according to their weights.