Skip to main content

Create Embeddings

Synopsis

Calculates embeddings from the given

column.

Description

This operator calculates an embedding (a vector in a high-dimensional space) for each row of the given text column. These embeddings can be used as input to machine learning algorithms but also as input to vector stores for performing similarity-based retrieval.

Differentiation

Insert Documents

The Insert Documents operator is used to add embeddings to a vector store.

Retrieve Documents

The Retrieve Documents operator is used to find related embeddings in a vector store.

Input

input

The table containing the column which will be converted into an embedding.

onnx model file

Optional ONNX Model file used for custom embedding models.

tokenizer json file

Optional pre-trained huggingface tokenizer json file for the given onnx_model_file.

Output

embedding

The Embeddings created from the given column.

through

The table that was provided at the input port is delivered through this output port without any modifications. This is usually used to reuse the same table in further operators of the process.

Parameters

column

The column that should be converted into an embedding.

embedding model

The embedding model that should be used.

pooling mode

The pooling mode used by the custom embedding model for dimensionality reduction.

  • CLS: Uses the first embedding (a special summarizing CLS embedding) of each sequence.
  • MEAN: Uses the average of all embeddings in the sequence.