SEUIR Repository

An ensemble approach to detect hate speech in Tamil tweets

Show simple item record

dc.contributor.author Shibly, F. H. A.
dc.contributor.author Sharma, Uzzal
dc.contributor.author Naleer, H. M. M.
dc.date.accessioned 2023-01-25T09:51:31Z
dc.date.available 2023-01-25T09:51:31Z
dc.date.issued 2022-09-28
dc.identifier.citation Proceedings of the 9th International Symposium - 2022 on “Socio-Economic Development through Arabic and Islamic Studies”. 28th September 2022. South Eastern University of Sri Lanka, University Park, Oluvil, Sri Lanka. pp. 511. en_US
dc.identifier.issn 978-624-5736-55-3
dc.identifier.uri http://ir.lib.seu.ac.lk/handle/123456789/6463
dc.description.abstract People have converged on a worldwide level because of advancements in communication technologies. They are critical in ensuring freedom of speech by allowing individuals to express their thoughts, behaviors, and opinions openly. Although this presents an excellent chance for racism, trolling, and exposure to a flood of offensive online content. As a result, the exponential growth of hate speech on social media significantly impacts society. In this research, we applied machine learning and deep learning algorithms to detect hate speech and compared the performances of those algorithms to develop an ensemble model. Researchers collected and combined two Tamil languages hate speech tweets datasets created by Bharathi Raja Chakravarthi et al. Tweets in this dataset are classified into two categories: not offensive and offensive. This dataset contains 10,129 tweets. Also, researchers selected six machine and deep learning algorithms for this study. Support Vector Machine (SVM), Logistic Regression (LR), Naïve Bayes (NB), Bidirectional LSTM, Multi-layer Perceptron (MLP) and Multilingual BERT were applied. Regarding detecting hate speech, SVM (82%) and LR (82%) have the best Accuracy. Furthermore, researchers developed two ensemble algorithms to construct the most efficient model. The first ensemble model was created by combining SVM, LR and NB and the second ensemble was developed using SVM and LR. Four algorithms, including the two ensemble models, obtained the same Accuracy. Therefore, the researchers compared the F1 score and found that the ensemble model 02 outperformed other classifiers. The findings of this research study are essential because these findings can be utilized as a model study for Tamil language hate speech to evaluate future research works using different machine learning algorithms for detecting hate speech more accurately and efficiently. en_US
dc.language.iso en_US en_US
dc.publisher Faculty of Islamic Studies and Arabic Language, South Eastern University of Sri Lanka, University Park, Oluvil. en_US
dc.subject Machine Learning en_US
dc.subject Deep Learning en_US
dc.subject Algorithms en_US
dc.subject Hate Speech en_US
dc.subject Detection and Ensemble Model en_US
dc.title An ensemble approach to detect hate speech in Tamil tweets en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search SEUIR


Advanced Search

Browse

My Account