Skip to main content

Athena Online Machine Learning Engine

Synopsis

This operator deploys the Online Machine Learning Engine (OML) on the provided Flink cluster and uses it to train and test a machine learning (ML) model on streamed data events in a streaming analytic workflow - the operator is also able to use the deployed OML Service to score unlabeled streamed date events with the trained ML model.

Description

This operator deploys the Online Machine Learning Engine (OML) on the provided Flink cluster and uses it to train and test a machine learning (ML) model on streamed data events in a streaming analytic workflow - the operator is also able to use the deployed OML Service to score unlabeled streamed date events with the trained ML model

This operators first deploys the OML Engine (developed by the Athena Research Center, jar needs to be provided) to the Flink cluster provided at the flink-connection input port. In addition the required topics (training, forecast input, forecast output, request, response, parameter message) on the kafka cluster for communication with the OML Engine are created as well.

The operator pushes the data events received at the training input port and the input stream port to the corresponding kafka topic. Configuration and requests messages are send to the corresponding topics by the operator as well. The OML Engine trains a ML model on the training input and scores the data events on the forecast input with the trained model.

The operator reads from the forecast output topic of the kafka cluster and pushes the computed scored/forecasted events further downstream (to the output stream port).

This is a streaming operator and needs to be placed inside a Streaming Nest or a Streaming Optimization operator. The operator defines the logical functionality and can be used in all streaming analytic workflow for any supported streaming platform (currently Flink and Spark). The actual implementation used depends on the type of connection connected to the Streaming Nest operator in which this operator is placed.

Input

kafka-connection

The connection to the Kafka Cluster which is used by the Online Machine Learning Engine for communication.

The connection to the Flink Cluster which on which the Online Machine Learning Engine shall be deployed.

training input

The input of the training data events. It needs to receive the output of a preceding streaming operator, to define the flow of data events in the streaming analytic workflow.

input stream

The input of unlabeled data events to be scored. It needs to receive the output of a preceding streaming operator, to define the flow of data events in the streaming analytic workflow.

Output

output stream

The output of the scored/forecasted data events. Connect it to the next Streaming operator to define the flow of the data events in the designed streaming analytic workflow.

Parameters

Training topic

Name of the Kafka topic used by the OML engine to receive training data events.

Forecast input topic

Name of the Kafka topic used by the OML engine to receive input data events to be forecasted/scored.

Forecast output topic

Name of the Kafka topic used by the SDE Service to push forecasted/scored output data events to.

Request topic

Name of the Kafka topic used by the OML engine to receive request messages.

Response topic

Name of the Kafka topic used by the SDE Service to push response messages to.

Parameter message topic

Name of the Kafka topic used by the OML engine to receive parameter messages.

Learner

Name of the ML Model learner used.

Learner hyper parameters

Hyper parameters for the ML Model learner.

Learner parameters

Parameters for the ML Model learner.

Preprocessor hyper parameters

Hyper parameters for the Preprocesser.

Preprocessor parameters

Parameters for the Preprocesser.

Training configuration

Configuration of the training job.

Job jar

Path to the .jar file of the OML Engine.