Skip to main content

Files

Uploading your data

You can upload new data files using the Add Data button found under Data Assets > Files.

  • If you add data from within a project, the project is the owner.
  • If you add data from within the global view, you are the owner.

Add Data

Supported formats

You can upload files of any data format to the Data Assets. Nevertheless, we distinguish between two different cases:

  • HDF5: The native data file format of Altair AI Cloud. You can find your HDF5 data files in Altair AI Studio in the folder Documents/RapidMiner, with the extension rmhdf5table.
  • Other: To use any other data format, such as CSV or Excel, you need to connect the input to the relevant operator in the workflow designer (e.g., Read CSV).

In practice, the difference is undramatic. It simply implies an extra step when developing your workflow.

Linking data to a project

In order to do anything with a data file, you must first link it to a project.

  • If the project does not exist, create the project.

  • If the project exists, click on the three vertical dots located to the right of the data file, select Link to Project, and then choose a project to which the data should be linked.

Link to a Project

  • If the data set is generated inside a project, it is automatically linked to this project.

Data linked to a project are available to all project contributors and visible to all project viewers. You may link the same data file to multiple projects.

Organizing your data

There are four elements that help you to organize and find your data.

  • Name: A search field allows you to filter by file name, by typing any substring. If you know its name, search is the easiest way to locate a data file.

  • Tags: Data files can have multiple tags, and you can use these tags to filter the set of files you want to see or exclude from the view.

  • Projects: Knowing which projects the data file is linked to will help you understand where it's used and what its potential dependencies are.

  • Filter type: The Filter type locates specific file types, such as Excel, CSV, etc.

Data filters

Other actions

From within Data Assets > Files, you can also perform a number of other actions on single or multiple data sets.

  • Link to a project
  • Open in Panopticon
  • Download: You can download data sets to a local folder by selecting Download.
  • Delete: If you no longer require a data set, you can delete it.

Viewing data files

The general details and contents of a data file can be viewed by clicking on its name. The default view of the page that displays following this action depends on the file type selected.

HDF5 files

Data

When an HDF5 file is selected, the default view of the page that displays next is the Data tab. This page shows you the data set in tabular form, including:

  • all column names and data types and
  • an indicator of the ratio of missing values in a column.

HDF5 data

When the chart icon next to a column name is selected, a popup displaying a plot of the data appears.

Data chart - icon

This plot can be customized via the Chart tab by clicking Open visualization at the bottom of the popup.

Details

Selecting the Details tab shows the general details of the data set.

HDF5 details

From here, you can:

  • Link the data set to projects to which you have access.
  • Add or remove Tags, for better organization.
  • Delete the data set.
  • Download the data set.

Chart

You can create basic and more advanced charts for a data set by selecting the Chart tab.

Data chart

When visualizing your data set, you can:

  • Choose from many available chart types.

  • Customize the chosen chart type.

  • Zoom into the chart area to have a closer look into certain parts.

  • From the menu icon on the chart, you can export the chart in various formats, including JPEG, PDF, PNG, and SVG.

Statistics

Clicking on the Statistics tab brings you to the Statistics page.

Data statistics

Here you can identify missing values and analyze basic type-dependent statistics for each of the columns in your data set, such as:

  • earliest and latest dates and duration;

  • min, max, average, and standard deviation; and

  • least- and most-common values.

Click on values to identify duplicate or unique values, or get a count of each value in a column.

Nominal values

Click the chart icon at the end of a row to see a basic visualization of the values in the column.

Product category chart

For more detailed charts, see the Chart tab.

Access

If an HDF5 file is added, an additional Access tab appears. Here, you can specify access permissions for the file:

  • Read (read only) or
  • Write (read-write).

Data permissions

Access to data files uploaded via a project are configured at the project level.

Data permissions - Project level

If the data file is linked to a project and a user has access to the project, that user does not need additional permissions to use the file in the same or other projects.

Other files

When other data file types such as Excel files are selected, their Details display in a new page as the default view.

Other file details

From here, you can select Access to provide permissions for other users to use the data file in their projects.