Flair is a Python library developed by Zalando Research that stands out as a notably user-friendly NLP framework. Flair NLP provides intuitive interfaces with exceptional multilingual embeddings, especially for various multilingual embedding frameworks like GloVe and transformer-based models on Hugging Face.
NLP Tasks
For R users, flairR extends FlairNLP with three NLP task functions to extract features in a neat format with data.table. Through these featured functions, you don’t have to write loops to format parsed output on your own. The main features include part-of-speech tagging, named entity recognition and sentiment analysis. Additionally, to handle the load on RAM when dealing with larger corpora, flairR supports batch processing to handle texts in batches, which is especially useful when dealing with large datasets, to optimize memory usage and performance.
Core Featured Functions | Loader | Supported Models from Flair NLP |
---|---|---|
get_entities() , get_entities_batch()
|
load_tagger_ner() |
en (English), fr (French), da
(Danish), nl (Dutch), and more. |
get_pos() , get_pos_batch()
|
load_tagger_pos() |
pos (English POS), fr-pos (French POS),
de-pos (German POS), nl-pos (Dutch POS), and
more. |
get_sentiments() ,
get_sentiments_batch()
|
load_tagger_sentiments() |
sentiment (English) ,
sentiment-fast (English) ,
de-offensive-language (German offensive language detection
model) |
Training and Fine-tuning
In flairR, we use the simplest S3
method to wrap major modules. All modules will work like R6 in the
R environment when loaded from Flair NLP. In Python, both functions and
methods (sometimes referred to as functions in R) within a class can be
accessed using the $
operator. For example,
from flair.trainers import ModelTrainer
in Python is
equivalent to
ModelTrainer <- flair_trainers()$ModelTrainer
in R
environment with flairR.
Wrapped Flair NLP Modules with S3 | Corresponding Code Practices When Loading Modules from FlairNLP |
---|---|
flair_datasets() |
from flair.datasets import * |
flair_nn() |
from flair.nn import * |
flair_splitter() |
from flair.splitter import * |
flair_trainers() |
from flair.trainers import * |
flair_models() |
from flair.models import * |
More Details about Installation
The installation consists of two parts: First, install Python 3.8 or higher, and
R 3.6.3 or higher. Although we
have tested it on Github Action with R 3.6.2, we strongly recommend
installing R
4.0.0 or above to ensure compatibility between the R environment and
Python. When first installed, flaiR automatically
detects whether you have Python 3.8 or higher. If not, it will skip the
automatic installation of Python and flair NLP. In this case, you will
need to mannually install it yourself and reload flaiR
again. If you have Python 3.8 or higher alreadt installed, the installer
of flaiR will automatically install flair Python NLP in
your global environment. If you are using {reticulate}, {flaiR} will
typically assume the r-reticulate environment by
default. At the same time, you can use py_config() to check the location
of your environment. Please note that flaiR will directly install flair
NLP in the Python environment that your R is using. This environment can
be adjusted through RStudio by navigating to
Tools -> Global Options -> Python
. If there are any
issues with the installation, feel free to ask in the Discussion.
For stable usage, we strongly recommend installing these specific versions.
OS | R Versions | Python Version |
---|---|---|
Mac | 4.3.2, 4.2.0, 4.2.1 | 3.10.x |
Mac | Latest | 3.9 |
Windows | 4.0.5 | 3.10.x |
Windows | Latest | 3.9 |
Ubuntu | 4.3.2, 4.2.0, 4.2.1 | 3.10.x |
Ubuntu | Latest | 3.9 |
During this process, you will observe numerous messages related to the installation of the Python environment and the Python flair module. Notably, flair has numerous dependencies, including libraries related to transformers (like the torch, tokeniser, transformers, gensim, flair, etc). Thus, the installation might take some time to complete.
There’s also another scenario to consider. If flaiR is unable to automatically install Flair and PyTorch, it will attempt to force the installation again. However, if this attempt fails, you’ll encounter the message: “Failed to install Flair. flaiR requires Flair NLP. Please ensure Flair NLP is installed in Python manually.” If you’re using an Apple operating environment, it’s essential at this point to check the compatibility of your M1/M2 chip with Python and Torch. If there are any issues with the installation, feel free to ask in the Discussion.
install.packages("remotes")
remotes::install_github("davidycliao/flaiR", force = TRUE)
library(flaiR)