Skip to main content

Generate n-Grams (Characters)

Synopsis

Creates character n-Grams of each token in a document.

Description

This operator creates all possible n-Grams of each token in a document. A character n-Gram is defined as a series of characters of length n. The n-Grams of a token generated by this operator consist of all series of characters of this token which have length n. If a token is shorter than the specified length n, the token itself is kept in the resulting document.

Input

document

The document port.

Output

document

The document port.

Parameters

Length

The length n of the n-grams.

Keep terms

Indicates if the original terms (i.e. tokens) should be kept along with the created n-grams.