Skip to content

This function processes token embeddings from a Flair sentence object and converts them into a matrix format with token names as row names. It handles the extraction of embeddings from tokens, retrieval of token texts, and conversion to matrix format.

Usage

process_embeddings(sentence, verbose = FALSE)

Arguments

sentence

A Flair sentence object containing tokens with embeddings. The sentence object should have a 'tokens' attribute, where each token has both an 'embedding' (with numpy() method) and 'text' attribute.

verbose

Logical indicating whether to print progress messages. Default is FALSE.

Value

A matrix where:

  • Each row represents a token's embedding

  • Row names are the corresponding token texts

  • Columns represent the dimensions of the embedding vectors

Details

The function will throw errors in the following cases:

  • If sentence is NULL or has no tokens

  • If any token is missing an embedding

  • If any token is missing text

Examples

if (FALSE) { # \dontrun{
# Create a Flair sentence
sentence <- Sentence("example text")
WordEmbeddings <- flair_embeddings()$WordEmbeddings

# Initialize FastText embeddings trained on Common Crawl
fasttext_embeddings <- WordEmbeddings('en-crawl')

# Apply embeddings
fasttext_embeddings$embed(sentence)

# Process embeddings with timing and messages
embedding_matrix <- process_embeddings(sentence, verbose = TRUE)
} # }