Process Token Embeddings from Flair Sentence Object

This function processes token embeddings from a Flair sentence object and converts them into a matrix format with token names as row names. It handles the extraction of embeddings from tokens, retrieval of token texts, and conversion to matrix format.

Usage

process_embeddings(sentence, verbose = FALSE)

Arguments

sentence: A Flair sentence object containing tokens with embeddings. The sentence object should have a 'tokens' attribute, where each token has both an 'embedding' (with numpy() method) and 'text' attribute.
verbose: Logical indicating whether to print progress messages. Default is FALSE.

Value

A matrix where:

Each row represents a token's embedding
Row names are the corresponding token texts
Columns represent the dimensions of the embedding vectors

Details

The function will throw errors in the following cases:

If sentence is NULL or has no tokens
If any token is missing an embedding
If any token is missing text

Examples

if (FALSE) { # \dontrun{
# Create a Flair sentence
sentence <- Sentence("example text")
WordEmbeddings <- flair_embeddings()$WordEmbeddings

# Initialize FastText embeddings trained on Common Crawl
fasttext_embeddings <- WordEmbeddings('en-crawl')

# Apply embeddings
fasttext_embeddings$embed(sentence)

# Process embeddings with timing and messages
embedding_matrix <- process_embeddings(sentence, verbose = TRUE)
} # }