Documents to Data
Synopsis
Generates a data set from documents.
Description
This operator generates a data set from a collection of documents. For each document in the collection, an example is added to the data set. The text contained in the document is stored in a nominal attribute. If a label or meta data are present associated with the documents, a label attribute or attribute for the meta data are created, respectively.
Input
documents
The documents port.
Output
example set
The example set port.
Parameters
Text attribute
The name of the text attribute.
Label attribute
The name of the label attribute.
Add meta information
If checked, available meta information of the text like filename, date is added as attribute.
Datamanagement
Determines, how the data is represented internally.