Skip to main content

Read XRFF

Synopsis

This operator is used for reading XRFF (eXtensible attribute-Relation File Format) files.

Description

This operator can read XRFF files known from Weka. The XRFF (eXtensible attribute-Relation File Format) is an XML-based extension of the ARFF format in some sense similar to the original RapidMiner file format for attribute description files (.aml). You can see a sample XRFF file by studying the attached Example Process.

Since the XML representation takes up considerably more space because the data is wrapped into XML tags, one can also compress the data via gzip. RapidMiner automatically recognizes a file being gzip compressed, if the file's extension is .xrff.gz instead of .xrff.

The XRFF file is divided into two portions i.e. the header and the body. The header has the meta data description and the body has the instances. Via the class="yes" attribute in the attribute specification in the header, one can define which attribute should be used as a prediction label attribute. Although the RapidMiner terminology for such classes is "label" instead of "class" we support the terminology class in order to have compatibility with the original XRFF files.

Input

file

This optional port expects a file object.

Output

output

The XRFF file is read from the specified path and the resultant ExampleSet is delivered through this port.

Parameters

Data file

This parameter specifies the path of the XRFF file. It can be selected using the choose a file button.

Id attribute

This parameter specifies the name of the id attribute. Please note that this field is case-sensitive.

Datamanagement

This parameter determines how the data is represented internally. This is an expert parameter. There are different options, users can choose any of them.

Decimal point character

This parameter specifies the character that is used as decimal point.

Sample ratio

This parameter specifies the fraction of the data set which should be read. If it is set to 1, the complete data set is read. If it is set to -1 then the sample size parameter is used for determining the size of the data to read.

Sample size

This parameter specifies the exact number of samples which should be read. If it is set to -1 the sample ratio parameter is used for determining the size of data to read. If both are set to -1 the complete data set is read.

Use local random seed

This parameter indicates if a local random seed should be used for randomization. Using the same value of local random seed will produce the same randomization.

Local random seed

This parameter specifies the local random seed. This parameter is only available if the use local random seed parameter is set to true.