Synopsis Data Engine
Synopsis
This operator uses the provided Synopsis Data Engine Service to compute synopsis of the streamed data events in a streaming analytic workflow.
Description
To utilize this operators functionality a Synopsis Data Engine Service (SDE Service) has to be set up. The Kafka cluster and the names of the request, data and output topic used by the SDE Service has to be provided to the operator.
The operator sends configuration messages to the request topic of the kafka cluster to configure the required synopsis computation. Then the input data events received at the input stream port are pushed to the data topic of the Kafka cluster. The SDE Service will compute the synopsis. The operator will send estimate requests through the request topic to the SDE Service (frequency: estimate frequency) to request the resulting estimates from the SDE Service (they will be pushed to the output topic of the Kafka cluster). If estimate continuous is selected, the synopsis is configured to produce continuous estimates, hence the estimate requests are not send in this case.
The operator reads from the output topic of the kafka cluster and pushes the computed estimated events further downstream (to the output stream port).
This is a streaming operator and needs to be placed inside a Streaming Nest or a Streaming Optimization operator. The operator defines the logical functionality and can be used in all streaming analytic workflow for any supported streaming platform (currently Flink and Spark). The actual implementation used depends on the type of connection connected to the Streaming Nest operator in which this operator is placed.
Input
connection
The connection to the Kafka Cluster which is used by the Synopsis Data Engine Service for communication.
input stream
The input of this streaming operation. It needs to receive the output of a preceding streaming operator, to define the flow of data events in the streaming analytic workflow.
Output
output stream
The output of this streaming operation. Connect it to the next Streaming operator to define the flow of the data events in the designed streaming analytic workflow.
Parameters
Synopsis type
The synopsis applied on the stream.
- count min: count min synopsis.
- bloom filter: bloom filter synopsis.
- ams: ams synopsis.
- dft: dft correlations synopsis.
- lsh: lsh synopsis.
- core sets: core sets synopsis.
- hyper log log: hyper log log synopsis.
- sticky sampling: sticky sampling synopsis.
- lossy counting: lossy counting synopsis.
- chain sampler: chain sampler synopsis.
- gk quantiles: gk quantiles synopsis.
- maritime: maritime synopsis (used for Use Case 3 of the INFORE project).
- top k: top k synopsis.
- optimal distributed window sampling: optimal distributed window sampling synopsis.
- optimal distributed sampling: optimal distributed sampling synopsis.
- window quantiles: window quantiles synopsis.
Data set key
Data set key.
Synopsis params
The parameters of the selected synopsis type (comma separated).
Synopsis parallelism
The parallelism level of the computation of the synopsis in the SDE Service
U id
The unique ID used by the synopsis computation.
Stream id key
Key in the data that should be used as StreamId.
Estimate type
Configure the estimation type for the selected synopsis type.
Estimate frequency
Frequency in which estimate requests are send to the SDE Service
Estimate params
Parameters used in the estimate requests (comma separated).
Request topic
Name of the Kafka topic used by the SDE Service to receive request/configuration messages.
Data topic
Name of the Kafka topic used by the SDE Service to receive input data events
Output topic
Name of the Kafka topic used by the SDE Service to push output data events to.