2502.04245

Total: 1

#1 TriNER: A Series of Named Entity Recognition Models For Hindi, Bengali & Marathi [PDF2] [Copy] [Kimi4] [REL]

Authors: Mohammed Amaan Dhamaskar, Rasika Ransing

India's rich cultural and linguistic diversity poses various challenges in the domain of Natural Language Processing (NLP), particularly in Named Entity Recognition (NER). NER is a NLP task that aims to identify and classify tokens into different entity groups like Person, Location, Organization, Number, etc. This makes NER very useful for downstream tasks like context-aware anonymization. This paper details our work to build a multilingual NER model for the three most spoken languages in India - Hindi, Bengali & Marathi. We train a custom transformer model and fine tune a few pretrained models, achieving an F1 Score of 92.11 for a total of 6 entity groups. Through this paper, we aim to introduce a single model to perform NER and significantly reduce the inconsistencies in entity groups and tag names, across the three languages.

Subjects: Computation and Language , Artificial Intelligence , Machine Learning

Publish: 2025-02-06 17:37:36 UTC