I. Katranis, C. Troussas, A. Krouska, Ph. Mylonas, C. Sgouropoulou |
Named Entity Recognition and News Article Classification: A Lightweight Approach |
IEEE Access, September 2025 |
ABSTRACT
|
This paper introduces TinyGreekNewsBERT, a 14.1 M-parameter distilled Transformer that performs both Named Entity Recognition (NER) and multiclass news-topic classification in Greek. We first compile and annotate a 20 000 article corpus with 32 IOB2 entity labels and 19 thematic categories, accompanied by a transparent, reproducible preprocessing pipeline. On this benchmark, TinyGreekNewsBERT reaches 81% micro F1 for NER and 78% classification accuracy, coming within five percentage points of GreekBERT (86% / 83%) while delivering comparable performance to mBERT (82% / 77%) and approaching XLMRoBERTa (85% / 82%). Crucially, compared with GreekBERT, our model is 8x smaller, requires 15x fewer FLOPs (1.3 BFLOPs at 128 tokens), and yields a median CPU latency of 14.7 ms per article, a 10x speed-up that makes it the first genuinely edge-deployable solution for Greek NER and news classification. Because the distillation and training pipeline is language-agnostic, the approach can be ported to other midresource languages and domains, offering a cost-effective path to multilingual, real-time NLP systems.
|
01 September, 2025 |
I. Katranis, C. Troussas, A. Krouska, Ph. Mylonas, C. Sgouropoulou, "Named Entity Recognition and News Article Classification: A Lightweight Approach", IEEE Access, September 2025 |
[
BibTex] [
Print] [
Back] |