Skip to content

This function interfaces with Python via reticulate to create a flair_embeddings.TransformerDocumentEmbeddings object from the flair.embeddings module.

Usage

flair_embeddings.TransformerDocumentEmbeddings(
  model = "bert-base-uncased",
  layers = "all",
  subtoken_pooling = "mean",
  fine_tune = FALSE,
  allow_long_sentences = TRUE,
  memory_efficient = NULL,
  use_context = FALSE
)

Arguments

model

A character string specifying the pre-trained model to use. Defaults to 'bert-base-uncased'. This could be the name of the transformer model, e.g., "bert-base-uncased", "gpt2-medium", etc. It can also be a path to a pre-trained model.

layers

(Optional) Layers of the transformer model to use. A string that specifies which layers of the transformer model to use. For BERT, you can specify multiple like "1,2,3" or single layers 1. The layers argument controls which transformer layers are used for the embedding. If you set this value to '-1,-2,-3,-4', the top 4 layers are used to make an embedding. If you set it to '-1', only the last layer is used. If you set it to "all", then all layers are used.

subtoken_pooling

(Optional) Method of pooling to handle subtokens. This determines how subtokens (word pieces) are pooled into one embedding for the original token. Options are 'first' (use first subtoken), 'last' (use last subtoken), 'first_last' (concatenate first and last subtokens), and 'mean' (average all subtokens).

fine_tune

Logical. Indicates if fine-tuning should be done. Defaults to FALSE.

allow_long_sentences

Logical. Allows longer sentences to be processed. Defaults to TRUE. In certain transformer models (like BERT), there is a maximum sequence length. By default, Flair cuts off sentences that are too \ long. If this option is set to True, Flair will split long sentences into smaller parts and later average the embeddings.

memory_efficient

(Optional) Enables memory efficient mode in transformers. When set to TRUE, uses less memory, but might be slower.

use_context

Logical. Whether to consider the surrounding context in some processing step. Default is FALSE.

Value

A Flair TransformerWordEmbeddings in Python class.

Details

This function provides an interface for R users to easily access and utilize the power of Flair's TransformerDocumentEmbeddings. It bridges the gap between Python's Flair library and R, enabling R users to leverage state-of-the-art NLP models.

References

In Python's Flair library:


from flair.embeddings import TransformerDocumentEmbeddings
embedding = TransformerDocumentEmbeddings('bert-base-uncased')

See also

Flair's official GitHub repository: https://github.com/flairNLP/flair

Examples

if (FALSE) { # \dontrun{
embedding <- flair_embeddings.TransformerDocumentEmbeddings("bert-base-uncased")
} # }