This function returns a data table of POS tags and other related data for the given texts.
Usage
get_pos(
texts,
doc_ids = NULL,
tagger = NULL,
language = NULL,
show.text_id = FALSE,
gc.active = FALSE
)
Arguments
- texts
A character vector containing texts to be processed.
- doc_ids
A character vector containing document ids.
- tagger
A tagger object (default is NULL).
- language
The language of the texts (default is NULL).
- show.text_id
A logical value. If TRUE, includes the actual text from which the entity was extracted in the resulting data table. Useful for verification and traceability purposes but might increase the size of the output. Default is FALSE.
- gc.active
A logical value. If TRUE, runs the garbage collector after processing all texts. This can help in freeing up memory by releasing unused memory space, especially when processing a large number of texts. Default is FALSE.
Value
A data.table containing the following columns:
doc_id
The document identifier corresponding to each text.
token_id
The token number in the original text, indicating the position of the token.
text_id
The actual text input passed to the function.
token
The individual word or token from the text that was POS tagged.
tag
The part-of-speech tag assigned to the token by the Flair library.
precision
A confidence score (numeric) for the assigned POS tag.
Examples
if (FALSE) { # \dontrun{
library(reticulate)
library(fliaR)
tagger_pos_fast <- load_tagger_pos('pos-fast')
texts <- c("UCD is one of the best universities in Ireland.",
"Essex is not in the Russell Group, but it is famous for political science research.",
"TCD is the oldest university in Ireland.")
doc_ids <- c("doc1", "doc2", "doc3")
get_pos(texts, doc_ids, tagger_pos_fast)
} # }