Skip to main content

The Catalog

Data on Altair AI Cloud

Data are critical for any data science project. The starting point is data, and the results may include enriched data.

Altair AI Cloud provides the catalog as an easily accessible shared repository for both:

  • uploaded data and
  • generated data.

Depending on the need, access to a data file can be restricted to a limited number of users or shared by multiple projects.

Global view / project view

The catalog provides global access to your data and has two access points:

  • Catalog
  • Projects

Within the catalog, you can see all the data sets you have uploaded or to which you have access. Data sets produced by running a process in which a data table is specified as an output may also be found here.

Note, however, that all work is done in the scope of a project, and the project has more limited data access. Each project includes a Data tab, where you can see the data accessible to that particular project.

Uploading your data

You can upload new data files using the Add Data button found in the catalog or selecting New > Add Data in the Data tab of any project.

In both cases, the data file is added to the catalog.

Add Data

Notice in the screenshot above that the data set Part Supplier Master List is owned by an individual user, whereas Retail Transactions is owned by a project. When a data set is uploaded using the Add Data button in the catalog, the user who uploaded the file is recorded as the owner. When it is added to the catalog using the New > Add Data button in the Data tab of a project, the project is recorded as the owner. Monthly_Transactions was generated from data files used within a project called Workflow_Designer_Updates; therefore, Workflow_Designer_Updates is the owner of this data set.

Supported formats

You can upload files of any data format to the catalog. Nevertheless, we distinguish between two different cases:

  • HDF5: The native data file format of Altair AI Cloud. You can find your HDF5 data files in Altair AI Studio in the folder Documents/RapidMiner, with the extension rmhdf5table.
  • Other: To use any other data format, such as CSV or Excel, you need to connect the input to the relevant operator in the workflow designer (e.g., Read CSV).

In practice, the difference is undramatic. It simply implies an extra step when developing your workflow.

Linking data to a project

In order to do anything with a data file, you must first link it to a project.

  • If the project does not exist, create the project.

  • If the project exists, click on the three vertical dots located to the right of the data file, select Link to Project, and then choose a project to which the data should be linked.

Link to a Project

  • If the data set is generated inside a project, it is automatically linked to this project.

Data linked to a project are available to all project contributors and visible to all project viewers. You may link the same data file to multiple projects.

Organizing your data

There are four elements that help you to organize and find your data.

  • Name: A search field allows you to filter by file name, by typing any substring. If you know its name, search is the easiest way to locate a data file.

  • Tags: Data files can have multiple tags, and you can use these tags to filter the set of files you want to see or exclude from the view.

  • Projects: Knowing which projects the data file is linked to will help you understand where it's used and what its potential dependencies are.

  • Filter type: The Filter type locates specific file types, such as Excel, CSV, etc.

Data filters

Other actions

Through the catalog, you can also perform a number of other actions on single or multiple data sets.

  • Download: You can download data sets to a local folder by selecting Download from the Actions column of the catalog.
  • Set tags: You can set tags for a data set by clicking on the three vertical dots located to the right of the file and selecting Set Tags from the options that display.
  • Link to a project
  • Delete: If you no longer require a data set, you can delete it by clicking on the three vertical dots located to the right of the file and selecting Delete.

Viewing data files

The general details and contents of a data file can be viewed by clicking on its name in the catalog. The default view of the page that displays following this action depends on the file type selected.

HDF5 files

Data

When an HDF5 file is selected from the catalog, the default view of the page that displays next is the Data tab. This page shows you the data set in tabular form, including:

  • all column names and data types and
  • an indicator of the ratio of missing values in a column.

HDF5 data

When the chart icon next to a column name is selected, a popup displaying a plot of the data appears.

Data chart - icon

This plot can be customized via the Chart tab by clicking Open visualization at the bottom of the popup.

Details

Selecting the Details tab shows the general details of the data set.

HDF5 details

From here, you can:

  • Link the data set to projects to which you have access.
  • Add or remove Tags, for better organization.
  • Delete the data set.
  • Download the data set.

Chart

You can create basic and more advanced charts for a data set by selecting the Chart tab.

Data chart

When visualizing your data set, you can:

  • Choose from many available chart types.

  • Customize the chosen chart type.

  • Zoom into the chart area to have a closer look into certain parts.

  • From the menu icon on the chart, you can export the chart in various formats, including JPEG, PDF, PNG, and SVG.

Statistics

Clicking on the Statistics tab brings you to the Statistics page.

Data statistics

Here you can identify missing values and analyze basic type-dependent statistics for each of the columns in your data set, such as:

  • earliest and latest dates and duration;

  • min, max, average, and standard deviation; and

  • least- and most-common values.

Click on values to identify duplicate or unique values, or get a count of each value in a column.

Nominal values

Click the chart icon at the end of a row to see a basic visualization of the values in the column.

Product category chart

For more detailed charts, see the Chart tab.

Access

If an HDF5 file is added directly to the catalog, an additional Access tab appears. Here, you can specify access permissions for the file:

  • Read (read only) or
  • Write (read-write).

Data permissions

Access to data files brought into the catalog via a project are configured at the project level.

Data permissions - Project level

If the data file is linked to a project and a user has access to the project, that user does not need additional permissions to use the file in the same or other projects.

Other files

When other data file types such as Excel files are selected in the catalog, their Details display in a new page as the default view.

Other file details

From here, you can select Access to provide permissions for other users to use the data file in their projects.