Tagging Sentiment with Flair Standard Models
David (Yen-Chieh) Liao
Postdoc at Text & Policy Research Group and SPIRe in UCDSource:
vignettes/get_sentiments.Rmd
get_sentiments.Rmd
An Example Using sentiment
Model (Pre-trained English
Model)
Download the English sentiment model from FlairNLP on Hugging Face. Currently, it also supports a large English sentiment model and a German pre-trained model.
tagger_sent <- load_tagger_sentiments("sentiment")
Flair NLP operates under the PyTorch framework. As such, we can use
the $to
method to set the device for the Flair Python
library. The flair_device(“cpu”) allows you to select whether to use the
CPU, CUDA devices (like cuda:0, cuda:1, cuda:2), or specific MPS devices
on Mac (such as mps:0, mps:1, mps:2). For information on Accelerated
PyTorch training on Mac, please refer to https://developer.apple.com/metal/pytorch/. For more
about CUDA, please visit: https://developer.nvidia.com/cuda-zone
tagger_sent$to(flair_device("mps"))
TextClassifier(
(embeddings): TransformerDocumentEmbeddings(
(model): DistilBertModel(
(embeddings): Embeddings(
(word_embeddings): Embedding(30522, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(transformer): Transformer(
(layer): ModuleList(
(0-5): 6 x TransformerBlock(
(attention): MultiHeadSelfAttention(
(dropout): Dropout(p=0.1, inplace=False)
(q_lin): Linear(in_features=768, out_features=768, bias=True)
(k_lin): Linear(in_features=768, out_features=768, bias=True)
(v_lin): Linear(in_features=768, out_features=768, bias=True)
(out_lin): Linear(in_features=768, out_features=768, bias=True)
)
(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(ffn): FFN(
(dropout): Dropout(p=0.1, inplace=False)
(lin1): Linear(in_features=768, out_features=3072, bias=True)
(lin2): Linear(in_features=3072, out_features=768, bias=True)
(activation): GELUActivation()
)
(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
)
)
)
)
)
(decoder): Linear(in_features=768, out_features=2, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(locked_dropout): LockedDropout(p=0.0)
(word_dropout): WordDropout(p=0.0)
(loss_function): CrossEntropyLoss()
)
results <- get_sentiments(uk_immigration$text, seq_len(nrow(uk_immigration)),
tagger_sent)
print(results)
#> doc_id sentiment score
#> <int> <char> <num>
#> 1: 1 POSITIVE 0.8097584
#> 2: 2 POSITIVE 0.9990165
#> 3: 3 POSITIVE 0.8827484
#> 4: 4 NEGATIVE 0.9997155
#> 5: 5 POSITIVE 0.8604351
Batch Processing in English Sentiment Model
Processing texts individually can be both inefficient and memory-intensive. On the other hand, processing all the texts simultaneously could surpass memory constraints, especially if each document in the dataset is sizable. Parsing the documents in smaller batches may provide an optimal compromise between these two scenarios. Batch processing can enhance efficiency and aid in memory management.
By default, the batch_size parameter is set to 5. You can consider starting with this default value and then experimenting with different batch sizes to find the one that works best for your specific use case. You can monitor memory usage and processing time to help you make a decision. If you have access to a GPU, you might also try larger batch sizes to take advantage of GPU parallelism. However, be cautious not to set the batch size too large, as it can lead to out-of-memory errors. Ultimately, the choice of batch size should be based on a balance between memory constraints, processing efficiency, and the specific requirements of your entity extraction task.
batch_process_results <- get_sentiments_batch(uk_immigration$text,
uk_immigration$speaker,
tagger_sent,
show.text_id = FALSE,
batch_size = 2,
verbose = TRUE)
#> CPU is used.
#> Processing batch 1 out of 3...
#> Processing batch 2 out of 3...
#> Processing batch 3 out of 3...
print(batch_process_results)
#> doc_id sentiment score
#> <char> <char> <num>
#> 1: Philip Hollobone POSITIVE 0.8097584
#> 2: Stewart Jackson POSITIVE 0.9990165
#> 3: Philip Hollobone POSITIVE 0.8827485
#> 4: Stewart Jackson NEGATIVE 0.9997155
#> 5: Philip Hollobone POSITIVE 0.8604351