Skip to main content

Welcome to the cloud!

Our new cloud platform should look familiar to you, if you are a previous user of RapidMiner Studio / RapidMiner AI Hub.

As before, teams can collaborate when developing data science projects, deploying the results to generate business value. The new platform actually doubles down on the idea of a multi-persona, multi-skills environment, but in a new setting where all the hassle of deployments, upgrades, and maintenance goes away.

Conceptual differences

We know that many of our existing users will want to move to the cloud platform, so we have tried to maintain compatibility. Nevertheless, due to some architectural improvements, the mapping between old and new is not exact.

One of the changes relates to how data is handled and stored. In RapidMiner Studio / RapidMiner AI Hub, data is stored in a local repository, in a remote repository, or in projects. In the cloud, on the other hand, the important concepts are:

Projects

Projects are use cases; they are central to development and deployment. Because projects are designed for collaboration, permissions are project-wide.

Within a project, you and your team will create and store anything related to the use case, such as processes, analyses, scripts, and models. In short, a project is the home for your work: where you collaborate, train, and deploy.

Catalog

Data is treated separately from projects, for several reasons:

  • To the extent that data is raw material, it is external to the workflow, not modified by users.
  • Nor is data usually deployed together with the project, even if the project reads, transforms, and enriches it.

That's why all data management is now centralized in what we call the catalog. The catalog securely protects data with permissions, while allowing for sharing and collaboration. It's fully searchable and organized through tags. Data in the catalog can be shared or linked with any number of projects.

Each project includes a Data tab that displays that part of the catalog that is linked to the project, including both uploaded data and generated data. So from a practical point of view, when you're working on a project, you need not think of the catalog as a separate entity.

In short, the catalog is the home for data, for the inputs and outputs of projects, including tables, text, images, and more.