Skip to content

This function returns a data table of POS tags and other related data for the given texts.

Usage

get_pos(
  texts,
  doc_ids = NULL,
  tagger = NULL,
  language = NULL,
  show.text_id = FALSE,
  gc.active = FALSE
)

Arguments

texts

A character vector containing texts to be processed.

doc_ids

A character vector containing document ids.

tagger

A tagger object (default is NULL).

language

The language of the texts (default is NULL).

show.text_id

A logical value. If TRUE, includes the actual text from which the entity was extracted in the resulting data table. Useful for verification and traceability purposes but might increase the size of the output. Default is FALSE.

gc.active

A logical value. If TRUE, runs the garbage collector after processing all texts. This can help in freeing up memory by releasing unused memory space, especially when processing a large number of texts. Default is FALSE.

Value

A data.table containing the following columns:

doc_id

The document identifier corresponding to each text.

token_id

The token number in the original text, indicating the position of the token.

text_id

The actual text input passed to the function.

token

The individual word or token from the text that was POS tagged.

tag

The part-of-speech tag assigned to the token by the Flair library.

precision

A confidence score (numeric) for the assigned POS tag.

Examples

if (FALSE) { # \dontrun{
library(reticulate)
library(fliaR)
tagger_pos_fast <- load_tagger_pos('pos-fast')
texts <- c("UCD is one of the best universities in Ireland.",
           "Essex is not in the Russell Group, but it is famous for political science research.",
           "TCD is the oldest university in Ireland.")
doc_ids <- c("doc1", "doc2", "doc3")

get_pos(texts, doc_ids, tagger_pos_fast)
} # }