Paraphrase Detection in Afan Oromo Texts  Using Deep Learning Techniques and its Application in automatic Plagiarism Detection

Wakjira, Bekele

AUIR Home
→
Institute of Technology
→
Department of Computer Science
→
Theses and Dissertations of this Department
→
View Item

dc.contributor.author	Wakjira, Bekele
dc.date.accessioned	2023-11-02T06:46:33Z
dc.date.available	2023-11-02T06:46:33Z
dc.date.issued	2021-12
dc.identifier.uri	http://hdl.handle.net/123456789/3182
dc.description.abstract	This study reports our investigation and experiments on the development of paraphrase detection and its application in automatic plagiarism detection for Afan Oromo texts. Paraphrasing is making a sentence in another form, like changing the sentence by the synonym of a keyword, adding a phrase to the word, or adding more details to a particular word; which is a way of conveying the same message without compromising the meaning. However, due to the rapidly increasing digital media and paraphrasing tools, paraphrasing increases the opportunity to commit paraphrase plagiarism, which is difficult to detect easily. Plagiarism is a persistent headache that plagiarism detection systems face because most plagiarism detection systems (many of which are commercially based) are designed to detect word co-occurrences and light modifications but they are incapable of detecting severe semantic, structural, and paraphrase texts. Paraphrase detection is a natural language processing task that involves determining the degree to which two text segments are related and has a great role to detect paraphrase plagiarism. Paraphrase detection has many applications in the field of natural language processing and understanding, such as machine translation, information retrieval, and question-answering. However, many research studies have been reported and implemented to detect paraphrases for resource-rich languages such as English, Chinese, German, French, and so on. To the best of the researcher's knowledge, there is no formal study reported on resource-scarce Ethiopian languages like Afan Oromo, Amharic, Somali, Sidama, and so on. Therefore, this study aimed to design and develop an automatic paraphrase detection model for Afan Oromo texts using deep learning techniques. To this end, a dataset was gathered and prepared from Afan Oromo documents publicly available at the Addis Ababa University Institutional Repository. First of all, we performed text preprocessing and data annotation tasks in cooperation with domain experts. While 80% of the data is used for training and creating deep learning models, the remaining 20% is used to test the performance of the model. Accordingly, the convolutional neural network model scored an accuracy of 67% with fast-Text word embedding, which is a promising performance for automatic paraphrase detection for Afan Oromo texts.	en_US
dc.language.iso	en	en_US
dc.publisher	Ambo University	en_US
dc.subject	Afan Oromo	en_US
dc.subject	Deep Learning	en_US
dc.subject	Paraphrase	en_US
dc.title	Paraphrase Detection in Afan Oromo Texts Using Deep Learning Techniques and its Application in automatic Plagiarism Detection	en_US
dc.type	Thesis	en_US