Skip to content

This function initializes Flair embeddings from flair.embeddings module.

Usage

flair_embeddings.FlairEmbeddings(embeddings_type = "news-forward")

Arguments

embeddings_type

A character string specifying the type of embeddings to initialize. Options include: "news-forward", "news-backward".

Value

A Flair embeddings class from the flair.embeddings module.

Details

Multi-Language Embeddings:

  • multi-X: Supports 300+ languages, sourced from the JW300 corpus. JW300 corpus, as proposed by Agić and Vulić (2019). The corpus is licensed under CC-BY-NC-SA.

  • multi-X-fast: CPU-friendly version, trained on a mix of corpora in languages like English, German, French, Italian, Dutch, and Polish.

English Embeddings:

  • 'news-X': Trained with 1 billion word corpus

  • 'news-X-fast': Trained with 1 billion word corpus, CPU-friendly.

  • 'mix-X': Trained with mixed corpus (Web, Wikipedia, Subtitles)

  • 'pubmed-X': Added by @jessepeng: Trained with 5% of PubMed abstracts until 2015 (1150 hidden states, 3 layers)

Specific Langauge Embeddings:

  • 'de-X': German. Trained with mixed corpus (Web, Wikipedia, Subtitles)

  • de-historic-ha-X: German (historical). Added by @stefan-it: Historical German trained over Hamburger Anzeiger.

  • de-historic-wz-X: German (historical). Added by @stefan-it: Historical German trained over Wiener Zeitung.

  • de-historic-rw-X: German (historical). Added by @redewiedergabe: Historical German trained over 100 million tokens

  • de-impresso-hipe-v1-X: In-domain data for the CLEF HIPE Shared task. In-domain data (Swiss and Luxembourgish newspapers) for CLEF HIPE Shared task. More information on the shared task can be found in this paper.

  • 'no-X': Norwegian. Added by @stefan-it: Trained with Wikipedia/OPUS.

  • 'nl-X': Dutch. Added by @stefan-it: Trained with Wikipedia/OPUS

  • 'nl-v0-X': Dutch.Added by @stefan-it: LM embeddings (earlier version)

  • 'ja-X': Japanese. Added by @frtacoa: Trained with 439M words of Japanese Web crawls (2048 hidden states, 2 layers)

  • 'ja-X': Japanese. Added by @frtacoa: Trained with 439M words of Japanese Web crawls (2048 hidden states, 2 layers)

  • 'fi-X': Finnish. Added by @stefan-it: Trained with Wikipedia/OPUS.

  • 'fr-X': French. Added by @mhham: Trained with French Wikipedia of Japanese Web crawls (2048 hidden states, 2 layers)

Domain-Specific Embeddings:

  • 'es-clinical-': Spanish (clinical). Added by @matirojasg: Trained with Wikipedia

  • 'pubmed-X':English. Added by @jessepeng: Trained with 5% of PubMed abstracts until 2015 (1150 hidden states, 3 layers)

The above are examples. Ensure you reference the correct embedding name and details for your application. Replace 'X' with either 'forward' or 'backward'. For a comprehensive list of embeddings, please refer to: Flair Embeddings Documentation.

References

FlairEmbeddings from the Flair Python library. Python example usage:


from flair.embeddings import FlairEmbeddings
flair_embedding_forward = FlairEmbeddings('news-forward')
flair_embedding_backward = FlairEmbeddings('news-backward')

Examples

if (FALSE) { # \dontrun{
flair_embedding_forward <- flair_embeddings.FlairEmbeddings("news-forward")
flair_embedding_backward <- flair_embeddings.FlairEmbeddings("news-backward")
} # }