Athena Online Machine Learning Engine
Synopsis
This operator deploys the Online Machine Learning Engine (OML) on the provided Flink cluster and uses it to train and test a machine learning (ML) model on streamed data events in a streaming analytic workflow - the operator is also able to use the deployed OML Service to score unlabeled streamed date events with the trained ML model.
Description
This operator deploys the Online Machine Learning Engine (OML) on the provided Flink cluster and uses it to train and test a machine learning (ML) model on streamed data events in a streaming analytic workflow - the operator is also able to use the deployed OML Service to score unlabeled streamed date events with the trained ML model
This operators first deploys the OML Engine (developed by the Athena Research Center, jar needs to be provided) to the Flink cluster provided at the flink-connection input port. In addition the required topics (training, forecast input, forecast output, request, response, parameter message) on the kafka cluster for communication with the OML Engine are created as well.
The operator pushes the data events received at the training input port and the input stream port to the corresponding kafka topic. Configuration and requests messages are send to the corresponding topics by the operator as well. The OML Engine trains a ML model on the training input and scores the data events on the forecast input with the trained model.
The operator reads from the forecast output topic of the kafka cluster and pushes the computed scored/forecasted events further downstream (to the output stream port).
This is a streaming operator and needs to be placed inside a Streaming Nest or a Streaming Optimization operator. The operator defines the logical functionality and can be used in all streaming analytic workflow for any supported streaming platform (currently Flink and Spark). The actual implementation used depends on the type of connection connected to the Streaming Nest operator in which this operator is placed.
Input
kafka-connection
The connection to the Kafka Cluster which is used by the Online Machine Learning Engine for communication.
flink-connection
The connection to the Flink Cluster which on which the Online Machine Learning Engine shall be deployed.
training input
The input of the training data events. It needs to receive the output of a preceding streaming operator, to define the flow of data events in the streaming analytic workflow.
input stream
The input of unlabeled data events to be scored. It needs to receive the output of a preceding streaming operator, to define the flow of data events in the streaming analytic workflow.
Output
output stream
The output of the scored/forecasted data events. Connect it to the next Streaming operator to define the flow of the data events in the designed streaming analytic workflow.
Parameters
Training topic
Name of the Kafka topic used by the OML engine to receive training data events.
Forecast input topic
Name of the Kafka topic used by the OML engine to receive input data events to be forecasted/scored.
Forecast output topic
Name of the Kafka topic used by the SDE Service to push forecasted/scored output data events to.
Request topic
Name of the Kafka topic used by the OML engine to receive request messages.
Response topic
Name of the Kafka topic used by the SDE Service to push response messages to.
Parameter message topic
Name of the Kafka topic used by the OML engine to receive parameter messages.
Learner
Name of the ML Model learner used.
Learner hyper parameters
Hyper parameters for the ML Model learner.
Learner parameters
Parameters for the ML Model learner.
Preprocessor hyper parameters
Hyper parameters for the Preprocesser.
Preprocessor parameters
Parameters for the Preprocesser.
Training configuration
Configuration of the training job.
Job jar
Path to the .jar file of the OML Engine.