Skip to main content

Add Embedding Layer

Synopsis

Adds an embedding layer to the neural network in order to replace example identifiers (like indices for token of text) by multi-dimensional numerical representations. Embeddings are often used to encode knowledge already into the representation.

Description

This operator has to be placed into the subprocess of the Deep Learning, Deep Learning (Tensor) or Autoencoder operator. It also needs to be the very first layer in the architecture.

An embedding is a relatively low-dimensional space into which one can translate high-dimensional vectors, e.g.: 300 for certain Word2Vec models. Embeddings make it easier to do machine learning on large inputs like sparse vectors representing words. Ideally, an embedding captures some of the semantics of the input by placing (semantically) similar inputs close together in the embedding space. An embedding can be learned and reused across models.

Example: "word" is replaced by its id (e.g. 19) in the embedding models dictionary (via the 'Index to Embedding' operator outside the network). The embedding layer converts the id into the given representation: 19 -> (1, 0, ..., 1).

Input

layerArchitecture

A network configuration setup with previous operators. Connect this port to the layer port of the "Deep Learning" operator.

Output

layerArchitecture

The network configuration with this embedding layer as first layer. Connect this port to the next input port of another layer or the layer port on the right side of the "Deep Learning" operator.

Parameters

Sequential embedding

Whether to use the sequential variant of the layer which specializes in sequential use-cases. Set this value if you are training with a tensor (inside Deep Learning (Tensor)).

Use existing embedding

Embeddings can either be learned, or as it is more often the case, can be reused over time. Popular models like Word2Vec or GloVe can make the overall training easier and faster since text representations do not have to be learnt over and over again. Use this option if you want to leverage an existing model.

Freeze training

If an existing model was chosen to be utilized, it might not make sense to fine-tune it all the time. In most cases, the model (~ embedding layer) can instead be simply frozen and ignored during training. This also leads to a consistent mapping of the words, especially if the underlying model (e.g.: Word2Vec, GloVe) is context independent.

Model type

Select the supported model types for embedding.

  • WORD2VEC: Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located close to one another in the space.
  • STATIC_MODEL: A static model is a model for distributed word representation. Select this option, if the representation is stored in a txt file with one token as a string per line, followed by the (decimal, english notation) numbers of its representation separated by spaces. E.g.: word 0.123 0.432 0.445 as one line in a txt file. A prime example is the structure of the GloVe, coined from Global Vectors, model. The model is an unsupervised learning algorithm for obtaining vector representations for words. This is achieved by mapping words into a meaningful space where the distance between words is related to semantic similarity. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Embedding model file

Provide path to the model file that contains the embedding weights/parameters.

Index range

Vocabulary size for the embedding. To learn a new embedding the layer needs to be configured the maximum number of "tokens" it needs to be able to map.

Output size

Size of the resulting vector space. The learnt embedding will map tokens into this new vector space (e.g.: Word2Vec-300 maps textual content, ~words, into 300 dimensional vectors).