Automatic Thesaurus  Construction for Afaan Oromo Text Retrieval

Lencho, Jambare

AUIR Home
→
Institute of Technology
→
Department of Computer Science
→
Theses and Dissertations of this Department
→
View Item

Automatic Thesaurus Construction for Afaan Oromo Text Retrieval

Lencho, Jambare

URI: http://hdl.handle.net/123456789/3163

Date: 2021-09

Abstract:

Information retrieval (IR) involves accessing of information items to satisfy information seekers desire. IR must give the user comfortable retrieval to information of their interest. In keyword based information retrieval if document not containing one or more words speci fied by the user, the document is not ranked at all. Information users often use query terms or key words that are different from the index terms used in the documents database. As a result, they may not get required search results although relevant documents available in the collection. Therefore, it becomes important to expand query terms with additional related terms drawn from a thesaurus. Even though a thesaurus can be constructed manually, auto matic thesaurus construction is preferred. Because manual thesaurus construction requires highly skilled experts in a subject domain and also it is extremely labor intensive and time consuming. But by following text corpora, generating thesaurus automatically without much human intervention is possible. In this research work automatic thesaurus for Afaan Oromo text retrieval was developed. The developed system have two main components; automatic thesaurus construction and text retrieval. CBOW and skip gram models were used to build thesaurus automatically from Afaan Oromo documents. The experiment indicate more relat ed terms were generated by skip gram model. Since promising related terms were achieved by skip gram model, thesaurus generated by skip gram model was used to evaluate Afaan Oromo text retrieval system. The system was tested with and without using thesaurus. The experimentation shown average precision and recall value of 43% and 39.2% before using thesaurus respectively and average precision and recall values of 65.6% and 76%, after us ing thesaurus respectively.

Show full item record