Sentence Based Paraphrase Detection Model Using Deep Learning Approach In Case Of Amharic Language.

Eyob, Kefelegn

AUIR Home
→
Institute of Technology
→
Department of Computer Science
→
Theses and Dissertations of this Department
→
View Item

dc.contributor.author	Eyob, Kefelegn
dc.date.accessioned	2023-10-27T11:24:56Z
dc.date.available	2023-10-27T11:24:56Z
dc.date.issued	2022-11
dc.identifier.uri	http://hdl.handle.net/123456789/3147
dc.description.abstract	The purpose of the paraphrase identification (PI) problem is to determine if two statements are similar enough in meaning to be classified as paraphrases, and it is the task of automatically recognizing whether sentence pairs have the same meaning, but It is difficult to accurately define the criteria for semantic equivalence (that is, the same or almost the same meaning) and can vary from task to task, and It is usually a binary classification issue. It is an alternative expression with the same (or similar) meaning. For example, "መርሳት" is a paraphrased form of "ማስታ዆ስአሇመቻሌ". The identification of paraphrases and the degree of their semantic similarity have proven useful in many NLP applications (Erfaneh Gharavi, Kayvan Bijari and Kiarash Zahirnia, 2017). For example, it can be used as a feature to enhance many other NLP tasks such as Information retrieval, machine translation scoring, text summarization, question answering, etc. Although a lot of paraphrase identification systems have been developed for various natural language texts, but no research has been conducted yet for Amharic Language. The proposed model will consider different word embedding methods such as word2vec, and fastText, and also we will use three different deep learning models such as BiLSTM_GRN, Siamese Network, and Feature Fusion Network models, to detect the paraphrased Sentence automatically and compare accuracy of all models. The proposed model will help people to detect the paraphrased sentence accurately and quickly, in order to avoid duplicate sentences that entail the same meaning and also to detect palajarism. Since there is no publicly available Amharic paraphrase dataset, the Dataset used for this purpose is gathered from online public available dataset of Addis Ababa University Institutional Repository which contains the collection of Amharic language masters of Art student‟s thesis. Then prepared the dataset consists of pairs of annotated sentences with linguistic expert of the domain. While 80% of the data is used for train and develop deep learning models, and the remaining 20% is used to test the performance of the model. Accordingly, the Siamese neural network model scored an accuracy of 0.9583 with fastText word embedding, which is a promising performance for automatic paraphrase detection for Amharic langage than BiLSTM-GRN and FFN models.	en_US
dc.language.iso	en	en_US
dc.publisher	Ambo University	en_US
dc.subject	Amharic	en_US
dc.subject	Gated Relevance Network(GRN)	en_US
dc.subject	Deep Learning(DL)	en_US
dc.title	Sentence Based Paraphrase Detection Model Using Deep Learning Approach In Case Of Amharic Language.	en_US
dc.type	Thesis	en_US