Skip to main content

Synopsis Data Engine

Synopsis

This operator uses the provided Synopsis Data Engine Service to compute synopsis of the streamed data events in a streaming analytic workflow.

Description

To utilize this operators functionality a Synopsis Data Engine Service (SDE Service) has to be set up. The Kafka cluster and the names of the request, data and output topic used by the SDE Service has to be provided to the operator.

The operator sends configuration messages to the request topic of the kafka cluster to configure the required synopsis computation. Then the input data events received at the input stream port are pushed to the data topic of the Kafka cluster. The SDE Service will compute the synopsis. The operator will send estimate requests through the request topic to the SDE Service (frequency: estimate frequency) to request the resulting estimates from the SDE Service (they will be pushed to the output topic of the Kafka cluster). If estimate continuous is selected, the synopsis is configured to produce continuous estimates, hence the estimate requests are not send in this case.

The operator reads from the output topic of the kafka cluster and pushes the computed estimated events further downstream (to the output stream port).

This is a streaming operator and needs to be placed inside a Streaming Nest or a Streaming Optimization operator. The operator defines the logical functionality and can be used in all streaming analytic workflow for any supported streaming platform (currently Flink and Spark). The actual implementation used depends on the type of connection connected to the Streaming Nest operator in which this operator is placed.

Input

connection

The connection to the Kafka Cluster which is used by the Synopsis Data Engine Service for communication.

input stream

The input of this streaming operation. It needs to receive the output of a preceding streaming operator, to define the flow of data events in the streaming analytic workflow.

Output

output stream

The output of this streaming operation. Connect it to the next Streaming operator to define the flow of the data events in the designed streaming analytic workflow.

Parameters

Synopsis type

The synopsis applied on the stream.

  • count min: count min synopsis.
  • bloom filter: bloom filter synopsis.
  • ams: ams synopsis.
  • dft: dft correlations synopsis.
  • lsh: lsh synopsis.
  • core sets: core sets synopsis.
  • hyper log log: hyper log log synopsis.
  • sticky sampling: sticky sampling synopsis.
  • lossy counting: lossy counting synopsis.
  • chain sampler: chain sampler synopsis.
  • gk quantiles: gk quantiles synopsis.
  • maritime: maritime synopsis (used for Use Case 3 of the INFORE project).
  • top k: top k synopsis.
  • optimal distributed window sampling: optimal distributed window sampling synopsis.
  • optimal distributed sampling: optimal distributed sampling synopsis.
  • window quantiles: window quantiles synopsis.

Data set key

Data set key.

Synopsis params

The parameters of the selected synopsis type (comma separated).

Synopsis parallelism

The parallelism level of the computation of the synopsis in the SDE Service

U id

The unique ID used by the synopsis computation.

Stream id key

Key in the data that should be used as StreamId.

Estimate type

Configure the estimation type for the selected synopsis type.

Estimate frequency

Frequency in which estimate requests are send to the SDE Service

Estimate params

Parameters used in the estimate requests (comma separated).

Request topic

Name of the Kafka topic used by the SDE Service to receive request/configuration messages.

Data topic

Name of the Kafka topic used by the SDE Service to receive input data events

Output topic

Name of the Kafka topic used by the SDE Service to push output data events to.