Skip to content

The flair.data module provides essential utilities for text data processing and representation in the Flair library. This function gives access to various classes and utilities in the flair.data module, most notably:

  • BoundingBox(left, top, right, bottom): Bases: tuple (Python); list (R)

    • left - str. Alias for field number 0.

    • top - int Alias for field number 1

    • right - int Alias for field number 2

    • bottom - int Alias for field number 3

  • Sentence(text, use_tokenizer=True, language_code=None, start_position=0):A Sentence is a list of tokens and is used to represent a sentence or text fragment. Sentence can be imported by flair_data()$Sentence via flaiR.

    • text Union[str, List[str], List[Token]] - The original string (sentence), or a pre-tokenized list of tokens.

    • use_tokenizer Union[bool, Tokenizer] - Specify a custom tokenizer to split the text into tokens. The default is flair.tokenization.SegTokTokenizer. If use_tokenizer is set to False, flair.tokenization.SpaceTokenizer will be used instead. The tokenizer will be ignored if text refers to pre-tokenized tokens.

    • language_code Optional[str] - Language of the sentence. If not provided, langdetect will be called when the language_code is accessed for the first time.

    • start_position int - Start character offset of the sentence in the superordinate document.

  • Span(tokens, tag=None, score=1.0): Bases: _PartOfSentence. A Span is a slice of a Sentence, consisting of a list of Tokens. Span can be imported by flair_data()$Span.

  • Token(text, head_id=None, whitespace_after=1, start_position=0, sentence=None): This class represents one word in a tokenized sentence. Each token may have any number of tags. It may also point to its head in a dependency tree. Token can be imported by flair_data()$Token via flaiR.

  • Corpus(train=None, dev=None, test=None, name='corpus', sample_missing_splits=True): Represents a collection of sentences, facilitating operations like splitting into train/test/development sets and applying transformations. It is particularly useful for training and evaluating models on custom datasets. Corpus can be imported by flair_data()$Corpus via flaiR.

  • Dictionary: Represents a mapping between items and indices. It is useful for converting text into machine-readable formats.

Usage

flair_data()

Value

A Python module (flair.data). To access the classes and utilities.

References

Python reference:


from flair.data import Sentence

See also

Examples

if (FALSE) { # \dontrun{
Sentence <- flair_data()$Sentence
Token <- flair_data()$Token
Corpus <- flair_data()$Corpus
} # }