dc.description.abstract |
The rapidly increasing popularity of the World Wide Web, smartphones, and social media
networks has resulted in the exponential growth and real-time dissemination of online news and
digital content. Since numerous social media users are often creating and sharing stories created
to misinform or deceive readers, they have played a major role in the proliferation of fabricated
information like fake news and fake reviews. Fake news has a direct impact on democracy and
may negatively affect public trust and justice on one hand and its extent is increasing at an alarming
rate on the other side. These properties of fake news have initiated an urgent need for high-tech
methods for their detection. Fake news detection is a challenging job due to the fact that such
content are deliberately created to misinform the consumers. In recent, social networks started
employing detection tools to educate the public on how to recognize fake news. In the literature,
it was observed that several machine learning, ensemble algorithms, and fake news dataset have
been developed and applied for the detection of fake news produced in resource-rich languages
like English and Portuguese. However, there is no reliable automated method and public fake news
dataset for detection of Afaan Oromo fake news on social media using advanced ensemble machine
learning approaches. In this study, a new dataset of Afaan Oromo is prepared and two advanced
ensemble approaches, stacking and voting, have been proposed and adopted to fill the identified
gap. Two different features extraction techniques were investigated and compared with a
combination of base classifiers, stacking and voting. Performance metrics such as accuracy, F1-
Score, recall, precision, ROC curve, and precision-recall curve have been used to measure the
performance of the proposed approaches. The experimental results showed that combining
classifiers can effectively improve the performance of Afaan Oromo fake news detection, up to
96.0% accuracy was achieved with the minimum error value. Combination based on the stacking
ensemble is consistently effective with Uni+TF-IDF (accuracy, 96.0% and F1-score, 95.9%),
Uni+Bi+TF-IDF (accuracy, 95.6% and F1-score, 95.7%) and Uni+Tri+TF-IDF (accuracy, 95.3%
and F1-score, 95.6%). Even though not effective as stacking, the voting ensemble approach also
efficiently performed with Uni+TF-IDF (accuracy, 95.8% and F1-score, 95.6%), Uni+Bi+TF-IDF
(accuracy, 95.1% and F1-score, 95.2%) and Uni+Tri+TF-IDF (accuracy, 95.1% and F1-score,
95.4%). Stacking and voting methods also exhibited a precise prediction performance, precision
values in the range of 93.4 – 96.3% and 93.8 – 95.6% were obtained for the stacking and voting,
xvi
respectively. The proposed ensemble approaches also outperformed other base classifiers, Random
Forest, AdaBoost, Knearest neighbor, Extra tree, and Logistic regression, in terms of accuracy,
F1-score, and precision metrics. Accordingly, the stacking and voting ensembles with Uni+TF IDF, Uni+Bi+TF-IDF, and Uni+Tri+TF-IDF methods are found to be more promising for Afaan
Oromo fake news detection. |
en_US |