dc.description.abstract |
Automatic relation extraction is the task of identifying or recognizing the relations between
entities. It is a sub component of information extraction among named entity recognition, co reference and anaphoric resolution, temporal and event detection and template filling.
Among those information extraction components this study focused only Automatic relation
extraction between entities for Amharic text using supervised machine learning approach.
Once named entities are identified, relation extraction is the second step and sub-task of
information extraction. For instance, “Engineer Ktaw Ejigu is the chief scientist of the
American space science” contains the person-affiliation relation between the person
Engineer Ktaw Ejigu and the organization American space science. From this the words “the
chief scientist of” are the relation between the entities of the person and organization. The
named entities are targeted predefined person, location and organization and the relations
are also predefined which are existed between those corresponding entities to extract in the
text. The problem of relation extraction from a text; the system developed for English
language text or other foreign languages text cannot be applicable for Ethiopian Amharic or
other text; this is due to the difference of nature or structure of the language. This study
prepared own corpus from Walta information center website archive resources to obtain a
suitable number of 30,466 words or tokens from 2000 sentences. To create the model of the
system this work used supervised machine learning. For this study two models are created
using support vector machine and conditional random field machine learning. In the SVM
algorithm model using stochastic gradient descent classifier algorithm precision 49%, recall
10% and F1-score 13%, passive aggressive classifier algorithm precision 55%, recall 41%
and F1-score of 48 and multinomial Naïve Bayes classifier algorithm is highly scorer among
the tested algorithms and can obtain results of Precision 60%, Recall 41% and F1-score of
48%. But, by using conditional random field it can be achieved Precision 87%, Recall 87%
and F-score 86% respectively. As the performance of the system indicates that; in this work
CRF is a selected algorithm to train and create a model for Automatic relation extraction
between entities for Amharic text proposed architecture. |
en_US |