Abstract:
Anaphora is used frequently in written texts and spoken conversation. Anaphora
resolution is the process of finding the antecedent of an anaphor. It is a very
challenging and complex problem in the Natural Language Processing (NLP) for
Afaan Oromo language. A majority of the NLP applications used for machine
translation information extraction, question answering, and text summarization, need
a proper resolution and identification of the anaphora. Despite the fact that several
works are done for anaphora resolution in English and other languages. However,
very few works have been worked for anaphora resolution in the Afaan Oromo
language and this due the lack of resources. Most researchers have applied a rule based approach for anaphora resolution in Afaan Oromo while no researchers have
used a machine learning approach for anaphora resolution task. In this study, we
have developed the pronominal anaphora resolution model for the Afaan Oromo
language using a conditional random field (CRF), machine learning approach. The
model deals with both Intra-sentential and inter-sentential kind of anaphora. We
developed our model using CRF++ 0.58 tool. Afaan Oromo texts are collected from
different sources such as Afaan Oromo holly bible, BBC Afaan Oromo news, Afaan
Oromo grade 9 and 11 student textbooks to evaluate the performance of this model.
Totally 1330 sentences with 12571 tokens were collected for both independent and
hidden anaphors. From this collected and prepared dataset, 80 % of the dataset was
used for training and the remaining 20% is for testing data. The result of Afaan
Oromo pronominal anaphora resolution was evaluated based on Conditional random
field (CRF) method, and obtained the precision of 78.87%, Recall of 91.80%, and F measure of 84.85% for resolution of independent anaphors. For resolution of hidden
anaphors obtained the precision of 80.41%, Recall of 95.12%, and F-measure of
87.15%.