Journal ArticleOpen Access

BenCoref: A Multi-Domain Dataset of Nominal Phrases and Pronominal Reference Annotations

Authors

Shadman Rohan, Mojammel Hossain, Mohammad Mamun Ur Rashid, Nabeel Mohammed

Author Affiliations

North South University, Jahangirnagar University

Year2023

Citations2

DOI10.18653/v1/2023.law-1.11

Abstract

Coreference Resolution is a well studied problem in NLP. While widely studied for English and other resource-rich languages, research on coreference resolution in Bengali largely remains unexplored due to the absence of relevant datasets. Bengali, being a low-resource language, exhibits greater morphological richness compared to English. In this article, we introduce a new dataset, BenCoref, comprising coreference annotations for Bengali texts gathered from four distinct domains. This relatively small dataset contains 5200 mention annotations forming 502 mention clusters within 48,569 tokens. We describe the process of creating this dataset and report performance of multiple models trained using BenCoref. We anticipate that our work sheds some light on the variations in coreference phenomena across multiple domains in Bengali and encourages the…

View at Publisher

BORR does not host full-text PDFs. The button above takes you to the original publisher.

Fields & Keywords

Physical Sciences Computer Science Artificial Intelligence Natural Language Processing Techniques Topic Modeling Text Readability and Simplification Natural language processing Artificial intelligence Mathematical analysis Management