Skip to main content

Split File by Point

Synopsis

Segments documents by defining the splitting point.

Description

Operator that allows to extract segments from a set of text documents in a directory based on a splitting the single documents into parts. The split point is described by a regular expression.

Input

through

The through port.

Output

through

The through port.

Parameters

Preview

Shows a preview for the results which will be achieved by the current configuration.

Texts

A directory containing the documents to be segmented

Output directory

The directory to which to write the segments

Split expression

Specifies the split points in the documents using a regular expression. For example splits on every line break.

Use file extension as type

If checked, the type of the files will be determined by their extensions. Unknown extensions will be treated as text files.

Content type

The content type of the input texts

Encoding

The encoding used for reading or writing files.