Aan Oromo Pronominal Anaphora Resolution Using Conditional Random Fields

Bikiltu, Guteta

AUIR Home
→
Institute of Technology
→
Department of Computer Science
→
Theses and Dissertations of this Department
→
View Item

Aan Oromo Pronominal Anaphora Resolution Using Conditional Random Fields

Bikiltu, Guteta

URI: http://hdl.handle.net/123456789/3142

Date: 2021-06

Abstract:

Anaphora is used frequently in written texts and spoken conversation. Anaphora resolution is the process of finding the antecedent of an anaphor. It is a very challenging and complex problem in the Natural Language Processing (NLP) for Afaan Oromo language. A majority of the NLP applications used for machine translation information extraction, question answering, and text summarization, need a proper resolution and identification of the anaphora. Despite the fact that several works are done for anaphora resolution in English and other languages. However, very few works have been worked for anaphora resolution in the Afaan Oromo language and this due the lack of resources. Most researchers have applied a rule based approach for anaphora resolution in Afaan Oromo while no researchers have used a machine learning approach for anaphora resolution task. In this study, we have developed the pronominal anaphora resolution model for the Afaan Oromo language using a conditional random field (CRF), machine learning approach. The model deals with both Intra-sentential and inter-sentential kind of anaphora. We developed our model using CRF++ 0.58 tool. Afaan Oromo texts are collected from different sources such as Afaan Oromo holly bible, BBC Afaan Oromo news, Afaan Oromo grade 9 and 11 student textbooks to evaluate the performance of this model. Totally 1330 sentences with 12571 tokens were collected for both independent and hidden anaphors. From this collected and prepared dataset, 80 % of the dataset was used for training and the remaining 20% is for testing data. The result of Afaan Oromo pronominal anaphora resolution was evaluated based on Conditional random field (CRF) method, and obtained the precision of 78.87%, Recall of 91.80%, and F measure of 84.85% for resolution of independent anaphors. For resolution of hidden anaphors obtained the precision of 80.41%, Recall of 95.12%, and F-measure of 87.15%.

Show full item record