Skip to main content

Input / Output

In the cloud, RapidMiner has no access to your local file system. Hence, to put your data to work, you must first upload your data to the catalog.

As stated previously, you can upload files of any data format to the catalog. Nevertheless, we distinguish between two different cases:

  • HDF5: the native data file format of RapidMiner. You can find your HDF5 data files in RapidMiner Studio in the folder Documents/RapidMiner, with the extension rmhdf5table.
  • Other: to use any other data format, such as CSV or Excel, you need to connect the input to the relevant operator in the workflow designer (e.g., Read CSV).

To understand the implications of this statement, examine the following screenshot. Two files containing identical data tables appear in the Data tab of the Titanic project: one in CSV format and one in rmhdf5table format (RapidMiner HDF5). Note that only one of the files -- the one in rmhdf5table format -- offers the action Start Auto ML.

Data: CSV and RMHDF5

Suppose, however, that you only have a CSV file available, and you want to use Auto ML -- what do you do? Obviously, you need to convert your CSV file to rmhdf5table format.

Let's start from scratch, assuming you have:

  • an existing project called Titanic and
  • a CSV file on the local file system.

From this starting point, we will recreate the state displayed in the above screenshot.

Add data

From the Data tab of the Titanic project, click on Add Data. A file selector appears, and you can upload your CSV file to the project.

Add CSV data

Since the CSV file was uploaded from within the project -- not via the catalog, the owner of the data is the project, in this case Titanic.

Create workflow

Switch to the Content tab, and select Create Workflow. Name your workflow and (optionally) describe its purpose:

Create workflow

Within the workflow, you are presented with a blank canvas and some control structures:

  • Operators - gives you access to RapidMiner's operators
  • Data - gives you access to data in the catalog
  • Project - gives you access to other resources in the project

A search field allows you to search in each of these categories. Once you locate the object of interest, you can drag it onto the canvas. In what follows, we will create a simple workflow.

Input

Select the Data tab, and drag Titanic-CSV to the canvas:

Drag data from catalog

Read CSV

Select the Operators tab, type Read CSV into the search field, and drag the Read CSV operator to the canvas:

Drag Read CSV operator

Output

Select the Operators tab, type Output into the search field, and drag the Output operator to the canvas:

Drag Output operator

Complete workflow

At this stage, you have all the components, but they are still disconnected.

Disconnected workflow

To complete the workflow, take the following steps:

  1. Connect the operators by clicking their ports.

  2. Hover the Output operator, and select Open Parameter Panel from the icons displayed below it.

  3. In the Output parameter panel:

    • Select Save Results
    • Under File Location, select Data and write Titanic as the name of the data file.
  4. Click Run Step.

Save results

Results

We have arrived at our destination. The results take two forms, as indicated by Output parameter panel:

  • The data is displayed.
  • The data is saved in a file called Titanic.

To see the data that has been saved in rmhd5table format, see the project's Data tab. For the Filter type, select Data Table, and only the file called Titanic will appear, not Titanic-CSV.

Data: CSV and RMHDF5