In-Database Processing
Introduction
When processing remote data using Altair AI Cloud, there are several different approaches:
-
You can download the data from the remote repository and process it in AI Cloud, before writing it back again, using operators such as
- Read Amazon S3
- Read Google Storage
- Write Database
- Write Salesforce
-
You can process the data within the remote repository, using the repository's native tools:
-
The Execute SQL operator executes SQL directly in a remote database.
-
In-Database Processing allows you to create native AI Cloud processes that are automatically translated into the query language of the remote repository, whether it be Google BigQuery, Oracle, Snowflake, or any other of the supported services.
-
This document will concentrate on the last of these methods, In-Database Processing.
Supported services
Currently the In-Database Processing feature supports the following services:
- Google BigQuery
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
- Snowflake
We are continuously adding support for other services based on popular demand.
Note that In-Database Processing provides standard SQL support for any SQL database, even if the database is not included in the above list. The differences between supported SQL databases and unsupported SQL databases are the following:
- Supported SQL databases (and indeed any supported services, not just SQL) appear in the SQL dialect parameter list in the In Database Nest operator. Unsupported SQL databases appear as other.
- With supported SQL databases, non-standard functions (e.g.
string_agg
in PostgreSQL) appear in the function list for the operator Generate Attributes (In Database). With unsupported SQL databases, only standard SQL functions are available.
Advantages of in-database processing
Moving large data sets over the internet can be both slow and costly.
With In-Database Processing, you can save both time and money by leveraging the computing power and efficiency of your data storage provider, with the following additional benefits:
- you reduce network congestion
- you limit local memory usage
- you keep your data securely in one place
- you can chain operators as you normally would in the Workflow Designer, and In-Database Processing translates this chain into a complex query that is then executed remotely. In short, it's an SQL query builder.
In Database Nest
The In-Database Processing feature provides a special group of operators that help you to build an SQL query.
- The In Database Nest operator provides a container that is part of your workflow.
- Inside that container, you include a special group of operators marked (In-Database Processing).
When executed, the In Database Nest operator converts the subprocess into a query that operates on the remote data repository using its native query language.
Parameter | Description |
---|---|
Connection Entry | The In Database Nest must provide a connection to the remote data repository. |
Limit Sample Size | Limit the number of rows returned by a query. |
Autodetect SQL dialect | Detect the SQL dialect automatically from the connection settings. If unchecked, you must select the dialect from SQL dialect parameter dropdown list. |
Parallelize | Set to true to concurrently submit multiple queries to the database. Multiple Nest outputs require multiple queries, and setting this parameter to true makes them run in parallel. Note that this may increase the load on your source database significantly. |
Allow All Operators | Set to true to allow using any type of operator inside the Nest, not just In-Database operators. Note that this option should only be considered as a last resort. These operators may not behave as intended, their behavior may depend on inner implementation details, and no backward compatibility is guaranteed if such operators are used inside the Nest. Also note that automatic replacement of non-In-Database operators when they are dragged and dropped into the Nest is disabled if this parameter is set to true. Please only use this option if you know what you are doing and you are ready to experiment. |
Available operators (In-Database Processing)
The complete list of operators belonging to In-Database Processing is given below:
- In Database Nest
Data Access (2)
- Retrieve (In-Database Processing)
- Store (In-Database Processing)
Blending (16)
- Aggregate (In-Database Processing)
- Convert Type (In-Database Processing)
- Custom Query (In-Database Processing)
- Filter Example Range (In-Database Processing)
- Filter Examples (In-Database Processing)
- Generate Attributes (In-Database Processing)
- Generate Rank (In-Database Processing)
- Join (In-Database Processing)
- Rename (In-Database Processing)
- Rename by Replacing (In-Database Processing)
- Reorder Attributes (In-Database Processing)
- Replace (In-Database Processing)
- Sample (In-Database Processing)
- Select Attributes (In-Database Processing)
- Sort (In-Database Processing)
- Union (In-Database Processing)
Cleansing (3)
- Declare Missing Value (In-Database Processing)
- Remove Duplicates (In-Database Processing)
- Replace Missing Values (In-Database Processing)
Utility (2)
- Multiply (In-Database Processing)
- Subprocess (In-Database Processing)