The flair.data
module provides essential utilities for text data
processing and representation in the Flair library. This function
gives access to various classes and utilities in the flair.data
module,
most notably:
BoundingBox(left, top, right, bottom): Bases: tuple (Python); list (R)
left - str. Alias for field number 0.
top - int Alias for field number 1
right - int Alias for field number 2
bottom - int Alias for field number 3
Sentence(text, use_tokenizer=True, language_code=None, start_position=0):A Sentence is a list of tokens and is used to represent a sentence or text fragment.
Sentence
can be imported byflair_data()$Sentence
via flaiR.text
Union[str, List[str], List[Token]]
- The original string (sentence), or a pre-tokenized list of tokens.use_tokenizer
Union[bool, Tokenizer]
- Specify a custom tokenizer to split the text into tokens. The default isflair.tokenization.SegTokTokenizer
. Ifuse_tokenizer
is set toFalse
,flair.tokenization.SpaceTokenizer
will be used instead. The tokenizer will be ignored iftext
refers to pre-tokenized tokens.language_code
Optional[str]
- Language of the sentence. If not provided,langdetect
will be called when thelanguage_code
is accessed for the first time.start_position
int
- Start character offset of the sentence in the superordinate document.
Span(tokens, tag=None, score=1.0): Bases: _PartOfSentence. A Span is a slice of a Sentence, consisting of a list of Tokens.
Span
can be imported byflair_data()$Span
.Token(text, head_id=None, whitespace_after=1, start_position=0, sentence=None): This class represents one word in a tokenized sentence. Each token may have any number of tags. It may also point to its head in a dependency tree.
Token
can be imported byflair_data()$Token
via flaiR.Corpus(train=None, dev=None, test=None, name='corpus', sample_missing_splits=True): Represents a collection of sentences, facilitating operations like splitting into train/test/development sets and applying transformations. It is particularly useful for training and evaluating models on custom datasets.
Corpus
can be imported byflair_data()$Corpus
via flaiR.Dictionary: Represents a mapping between items and indices. It is useful for converting text into machine-readable formats.