Abstract:
Information retrieval (IR) involves accessing of information items to satisfy information
seekers desire. IR must give the user comfortable retrieval to information of their interest. In
keyword based information retrieval if document not containing one or more words speci fied by the user, the document is not ranked at all. Information users often use query terms
or key words that are different from the index terms used in the documents database. As a
result, they may not get required search results although relevant documents available in the
collection. Therefore, it becomes important to expand query terms with additional related
terms drawn from a thesaurus. Even though a thesaurus can be constructed manually, auto matic thesaurus construction is preferred. Because manual thesaurus construction requires
highly skilled experts in a subject domain and also it is extremely labor intensive and time
consuming. But by following text corpora, generating thesaurus automatically without much
human intervention is possible. In this research work automatic thesaurus for Afaan Oromo
text retrieval was developed. The developed system have two main components; automatic
thesaurus construction and text retrieval. CBOW and skip gram models were used to build
thesaurus automatically from Afaan Oromo documents. The experiment indicate more relat ed terms were generated by skip gram model. Since promising related terms were achieved
by skip gram model, thesaurus generated by skip gram model was used to evaluate Afaan
Oromo text retrieval system. The system was tested with and without using thesaurus. The
experimentation shown average precision and recall value of 43% and 39.2% before using
thesaurus respectively and average precision and recall values of 65.6% and 76%, after us ing thesaurus respectively.