dc.description.abstract |
Text classification is one of the most widely used natural language processing technologies.
It’s the technique which classifies unstructured text data into meaningful categorical
classes. With the continuously increasing amount of online information, there is a pressing
need to classify text for valuable information. Previously, many researchers have been done
Afan Oromo text classification using machine learning methods. However, most of these
traditional methods use TF-IDF, Bag of words to map some representation of the input data
to predefined set of meaningful outputs but ignoring the context and internal hierarchy of
the text and in addition, the traditional approach treats labels as independent individuals
while ignores the relationships between them, which not reflect reality but also leads
significant loss of semantic information, these limitations can be solved by deep learning
methods. So, in this study, we use a Single layer Multi-Size Filters Convolutional Neural
Network for document text classification and we collect dataset that contains 6450
documents organize into ten classes. We also look at how preprocessing approaches affect
the performance of Single-layer Multi-Size Filters Convolutional Neural Networks. After
hyperparameter tuning of our model, the performance of SMF-CNN evaluated using those
different ways: Fast-Text pre-trained and Word2vec pre-trained word embedding, the other
is without using pre-trained embedding. The experimental results show Single-layer Multi Size Filters Convolutional Neural Network performance can achieve both effectiveness
and good scalability of the accuracy is 96.81%, it can be seen that only Fast-Text pre trained word embedding is greater accuracy. |
en_US |