Batch Process of Tagging Sentiment with Flair Models — get_sentiments

This function takes in texts and their associated document IDs to predict sentiments using the flair Python library.

Usage

get_sentiments_batch(
  texts,
  doc_ids,
  tagger = NULL,
  ...,
  language = NULL,
  show.text_id = FALSE,
  gc.active = FALSE,
  batch_size = 5,
  device = "cpu",
  verbose = FALSE
)

Arguments

texts

A list or vector of texts for which sentiment prediction is to be made.

doc_ids

A list or vector of document IDs corresponding to the texts.

tagger

An optional flair sentiment model. If NULL (default), the function loads the default model based on the language.

...

Additional arguments passed to next.

language

A character string indicating the language of the texts. Currently supports "sentiment" (English), "sentiment-fast" (English), and "de-offensive-language" (German)

show.text_id

A logical value. If TRUE, includes the actual text from which the sentiment was predicted. Default is FALSE.

gc.active

A logical value. If TRUE, runs the garbage collector after processing all texts. This can help in freeing up memory by releasing unused memory space, especially when processing a large number of texts. Default is FALSE.

batch_size

An integer specifying the number of texts to be processed at once. It can help optimize performance by leveraging parallel processing. Default is 5.

device

A character string specifying the computation device. It can be either "cpu" or a string representation of a GPU device number. For instance, "0" corresponds to the first GPU. If a GPU device number is provided, it will attempt to use that GPU. The default is "cpu".

"cuda" or "cuda:0" ("mps" or "mps:0" in Mac M1/M2 )Refers to the first GPU in the system. If there's only one GPU, specifying "cuda" or "cuda:0" will allocate computations to this GPU.
"cuda:1" ("mps:1")Refers to the second GPU in the system, allowing allocation of specific computations to this GPU.
"cuda:2" ("mps:2)Refers to the third GPU in the system, and so on for systems with more GPUs.

verbose

A logical value. If TRUE, the function prints batch processing progress updates. Default is TRUE.

Value

A data.table containing three columns:

doc_id: The document ID from the input.
sentiment: Predicted sentiment for the text.
score: Score for the sentiment prediction.

Examples

if (FALSE) { # \dontrun{
library(flaiR)


texts <- c("UCD is one of the best universities in Ireland.",
           "UCD has a good campus but is very far from my apartment in Dublin.",
           "Essex is famous for social science research.",
           "Essex is not in the Russell Group, but it is famous for political science research.",
           "TCD is the oldest university in Ireland.",
           "TCD is similar to Oxford.")

doc_ids <- c("doc1", "doc2", "doc3", "doc4", "doc5", "doc6")

# Load re-trained sentiment ("sentiment") model
tagger_sent <- load_tagger_sentiments('sentiment')

results <- get_sentiments_batch(texts, doc_ids, tagger_sent, batch_size = 3)
print(results)
} # }