Stem Tokens using ExampleSet
Synopsis
Replaces terms by pattern matching rules. This operator uses an ExampleSet to stem a list of words inside a "Process Documents" operator.
Description
This operator can be used in your "Process Documents" operator and allows to provide a custom list of tokens to be filtered out. It is like the Stem (Dictionary) operator, except the input here is an ExampleSet rather than a file.
It reduces terms to a base form using an external ExampleSet with replacement rules. The ExampleSet must contain a rule per line: targetExpression:pattern1 pattern2 ... where targetExpression is the term to which the input terms are reduced, if it matches any of the patterns. patternX is a simple string or a regular expression. A simple example would be a mapping like: weekday : .*day Please keep in mind, that very short words are filtered out in the default setting of the TextInput operators.
Input
doc
The documents input port.
exa
The ExampleSet with the tokens.
Output
doc
The resulting document.
Parameters
Attribute
The name of the attribute that should be used for stemming.