Skip to main content

Documents to Data

Synopsis

Generates a data set from documents.

Description

This operator generates a data set from a collection of documents. For each document in the collection, an example is added to the data set. The text contained in the document is stored in a nominal attribute. If a label or meta data are present associated with the documents, a label attribute or attribute for the meta data are created, respectively.

Input

documents

The documents port.

Output

example set

The example set port.

Parameters

Text attribute

The name of the text attribute.

Label attribute

The name of the label attribute.

Add meta information

If checked, available meta information of the text like filename, date is added as attribute.

Datamanagement

Determines, how the data is represented internally.