The flair.data module provides essential utilities for text data
processing and representation in the Flair library. This function
gives access to various classes and utilities in the flair.data module,
most notably:
BoundingBox(left, top, right, bottom): Bases: tuple (Python); list (R)
left - str. Alias for field number 0.
top - int Alias for field number 1
right - int Alias for field number 2
bottom - int Alias for field number 3
Sentence(text, use_tokenizer=True, language_code=None, start_position=0):A Sentence is a list of tokens and is used to represent a sentence or text fragment.
Sentencecan be imported byflair_data()$Sentencevia flaiR.text
Union[str, List[str], List[Token]]- The original string (sentence), or a pre-tokenized list of tokens.use_tokenizer
Union[bool, Tokenizer]- Specify a custom tokenizer to split the text into tokens. The default isflair.tokenization.SegTokTokenizer. Ifuse_tokenizeris set toFalse,flair.tokenization.SpaceTokenizerwill be used instead. The tokenizer will be ignored iftextrefers to pre-tokenized tokens.language_code
Optional[str]- Language of the sentence. If not provided,langdetectwill be called when thelanguage_codeis accessed for the first time.start_position
int- Start character offset of the sentence in the superordinate document.
Span(tokens, tag=None, score=1.0): Bases: _PartOfSentence. A Span is a slice of a Sentence, consisting of a list of Tokens.
Spancan be imported byflair_data()$Span.Token(text, head_id=None, whitespace_after=1, start_position=0, sentence=None): This class represents one word in a tokenized sentence. Each token may have any number of tags. It may also point to its head in a dependency tree.
Tokencan be imported byflair_data()$Tokenvia flaiR.Corpus(train=None, dev=None, test=None, name='corpus', sample_missing_splits=True): Represents a collection of sentences, facilitating operations like splitting into train/test/development sets and applying transformations. It is particularly useful for training and evaluating models on custom datasets.
Corpuscan be imported byflair_data()$Corpusvia flaiR.Dictionary: Represents a mapping between items and indices. It is useful for converting text into machine-readable formats.
