Initializing a Class for Flair's Forward and Backward Embeddings
Source:R/flair_embeddings.R
flair_embeddings.FlairEmbeddings.Rd
This function initializes Flair embeddings from flair.embeddings module.
Details
Multi-Language Embeddings:
multi-X: Supports 300+ languages, sourced from the JW300 corpus. JW300 corpus, as proposed by Agić and Vulić (2019). The corpus is licensed under CC-BY-NC-SA.
multi-X-fast: CPU-friendly version, trained on a mix of corpora in languages like English, German, French, Italian, Dutch, and Polish.
English Embeddings:
'news-X': Trained with 1 billion word corpus
'news-X-fast': Trained with 1 billion word corpus, CPU-friendly.
'mix-X': Trained with mixed corpus (Web, Wikipedia, Subtitles)
'pubmed-X': Added by @jessepeng: Trained with 5% of PubMed abstracts until 2015 (1150 hidden states, 3 layers)
Specific Langauge Embeddings:
'de-X': German. Trained with mixed corpus (Web, Wikipedia, Subtitles)
de-historic-ha-X: German (historical). Added by @stefan-it: Historical German trained over Hamburger Anzeiger.
de-historic-wz-X: German (historical). Added by @stefan-it: Historical German trained over Wiener Zeitung.
de-historic-rw-X: German (historical). Added by @redewiedergabe: Historical German trained over 100 million tokens
de-impresso-hipe-v1-X: In-domain data for the CLEF HIPE Shared task. In-domain data (Swiss and Luxembourgish newspapers) for CLEF HIPE Shared task. More information on the shared task can be found in this paper.
'no-X': Norwegian. Added by @stefan-it: Trained with Wikipedia/OPUS.
'nl-X': Dutch. Added by @stefan-it: Trained with Wikipedia/OPUS
'nl-v0-X': Dutch.Added by @stefan-it: LM embeddings (earlier version)
'ja-X': Japanese. Added by @frtacoa: Trained with 439M words of Japanese Web crawls (2048 hidden states, 2 layers)
'ja-X': Japanese. Added by @frtacoa: Trained with 439M words of Japanese Web crawls (2048 hidden states, 2 layers)
'fi-X': Finnish. Added by @stefan-it: Trained with Wikipedia/OPUS.
'fr-X': French. Added by @mhham: Trained with French Wikipedia of Japanese Web crawls (2048 hidden states, 2 layers)
Domain-Specific Embeddings:
'es-clinical-': Spanish (clinical). Added by @matirojasg: Trained with Wikipedia
'pubmed-X':English. Added by @jessepeng: Trained with 5% of PubMed abstracts until 2015 (1150 hidden states, 3 layers)
The above are examples. Ensure you reference the correct embedding name and details for your application. Replace 'X' with either 'forward' or 'backward'. For a comprehensive list of embeddings, please refer to: Flair Embeddings Documentation.