Generate n-Grams (Characters)
Synopsis
Creates character n-Grams of each token in a document.
Description
This operator creates all possible n-Grams of each token in a document. A character n-Gram is defined as a series of characters of length n. The n-Grams of a token generated by this operator consist of all series of characters of this token which have length n. If a token is shorter than the specified length n, the token itself is kept in the resulting document.
Input
document
The document port.
Output
document
The document port.
Parameters
Length
The length n of the n-grams.
Keep terms
Indicates if the original terms (i.e. tokens) should be kept along with the created n-grams.