Initializing a Class for TransformerDocumentEmbeddings
Source:R/flair_embeddings.R
flair_embeddings.TransformerDocumentEmbeddings.Rd
This function interfaces with Python via reticulate to
create a flair_embeddings.TransformerDocumentEmbeddings
object from
the flair.embeddings module.
Usage
flair_embeddings.TransformerDocumentEmbeddings(
model = "bert-base-uncased",
layers = "all",
subtoken_pooling = "mean",
fine_tune = FALSE,
allow_long_sentences = TRUE,
memory_efficient = NULL,
use_context = FALSE
)
Arguments
- model
A character string specifying the pre-trained model to use. Defaults to 'bert-base-uncased'. This could be the name of the transformer model, e.g., "bert-base-uncased", "gpt2-medium", etc. It can also be a path to a pre-trained model.
- layers
(Optional) Layers of the transformer model to use. A string that specifies which layers of the transformer model to use. For BERT, you can specify multiple like "1,2,3" or single layers 1. The layers argument controls which transformer layers are used for the embedding. If you set this value to '-1,-2,-3,-4', the top 4 layers are used to make an embedding. If you set it to '-1', only the last layer is used. If you set it to "all", then all layers are used.
- subtoken_pooling
(Optional) Method of pooling to handle subtokens. This determines how subtokens (word pieces) are pooled into one embedding for the original token. Options are 'first' (use first subtoken), 'last' (use last subtoken), 'first_last' (concatenate first and last subtokens), and 'mean' (average all subtokens).
- fine_tune
Logical. Indicates if fine-tuning should be done. Defaults to FALSE.
- allow_long_sentences
Logical. Allows longer sentences to be processed. Defaults to TRUE. In certain transformer models (like BERT), there is a maximum sequence length. By default, Flair cuts off sentences that are too \ long. If this option is set to True, Flair will split long sentences into smaller parts and later average the embeddings.
- memory_efficient
(Optional) Enables memory efficient mode in transformers. When set to TRUE, uses less memory, but might be slower.
- use_context
Logical. Whether to consider the surrounding context in some processing step. Default is FALSE.
Details
This function provides an interface for R users to easily access and utilize the power of Flair's TransformerDocumentEmbeddings. It bridges the gap between Python's Flair library and R, enabling R users to leverage state-of-the-art NLP models.
References
In Python's Flair library:
from flair.embeddings import TransformerDocumentEmbeddings
embedding = TransformerDocumentEmbeddings('bert-base-uncased')
See also
Flair's official GitHub repository: https://github.com/flairNLP/flair