Optimize by Generation (YAGGA)
Synopsis
This operator may select some attributes from the original attribute set and it may also generate new attributes from the original attribute set. YAGGA (Yet Another Generating Genetic Algorithm) does not change the original number of attributes unless adding or removing (or both) attributes prove to have a better fitness.
Description
Sometimes the selection of features alone is not sufficient. In these cases other transformations of the feature space must be performed. The generation of new attributes from the given attributes extends the feature space. Maybe a hypothesis can be easily found in the extended feature space. This operator can be considered to be a blend of attribute selection and attribute generation procedures. It may select some attributes from the original set of attributes and it may also generate new attributes from the original attributes. The (generating) mutation can do one of the following things with different probabilities:
- Probability p/4: Add a newly generated attribute to the feature vector.
- Probability p/4: Add a randomly chosen original attribute to the feature vector.
- Probability p/2: Remove a randomly chosen attribute from the feature vector.
A genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover. For studying the basic algorithm of a genetic algorithm please study the description of the <reference key="operator.optimize_selection_evolutionary">Optimize Selection (Evolutionary)</reference>
operator.
This operator is a nested operator i.e. it has a subprocess. The subprocess must return a performance vector. You need to have basic understanding of subprocesses in order to apply this operator. Please study the documentation of the <reference key="operator.subprocess">Subprocess</reference>
operator for basic understanding of subprocesses.
Differentiation
Optimize by Generation (YAGGA2)
The YAGGA2 operator is an improved version of the usual YAGGA operator, this operator allows more feature generators and provides several techniques for redundancy prevention. This leads to smaller ExampleSets containing less redundant features.
Input
example set in
This input port expects an ExampleSet. This ExampleSet is available at the first port of the nested chain (inside the subprocess) for processing in the subprocess.
Output
example set out
The genetic algorithm is applied on the input ExampleSet. The resultant ExampleSet is delivered through this port.
attribute weights out
The attribute weights are delivered through this port.
performance out
This port delivers the Performance Vector for the selected attributes. A Performance Vector is a list of performance criteria values.
Parameters
Limit max total number of attributes
This parameter indicates if the total number of attributes in all generations should be limited. If set to true, the maximum number is specified by the max total number of attributes parameter.
Max total number of attributes
This parameter is only available when the limit max total number of attributes parameter is set to true. This parameter specifies the maximum total number of attributes in all generations.
Use local random seed
This parameter indicates if a local random seed should be used for randomization. Using the same value of local random seed will produce the same randomization.
Local random seed
This parameter specifies the local random seed. This parameter is available only if the use local random seed parameter is set to true.
Show stop dialog
This parameter determines if a dialog with a stop button should be displayed which stops the search for the best feature space. If the search for best feature space is stopped, the best individual found till then will be returned.
Maximal fitness
This parameter specifies the maximal fitness. The optimization will stop if the fitness reaches this value.
Population size
This parameter specifies the population size i.e. the number of individuals per generation.
Maximum number of generations
This parameter specifies the number of generations after which the algorithm should be terminated.
Use plus
This parameter indicates if the summation function should be applied for generation of new attributes.
Use diff
This parameter indicates if the difference function should be applied for generation of new attributes.
Use mult
This parameter indicates if the multiplication function should be applied for generation of new attributes.
Use div
This parameter indicates if the division function should be applied for generation of new attributes.
Use reciprocals
This parameter indicates if the reciprocal function should be applied for generation of new attributes.
Use early stopping
This parameter enables early stopping. If not set to true, always the maximum number of generations are performed.
Generations without improval
This parameter is only available when the use early stopping parameter is set to true. This parameter specifies the stop criterion for early stopping i.e. it stops after n generations without improvement in the performance. n is specified by this parameter.
Tournament size
This parameter specifies the fraction of the current population which should be used as tournament members.