An ensemble approach to detect hate speech in Tamil tweets

Shibly, F. H. A.; Sharma, Uzzal; Naleer, H. M. M.

dc.contributor.author	Shibly, F. H. A.
dc.contributor.author	Sharma, Uzzal
dc.contributor.author	Naleer, H. M. M.
dc.date.accessioned	2023-01-25T09:51:31Z
dc.date.available	2023-01-25T09:51:31Z
dc.date.issued	2022-09-28
dc.identifier.citation	Proceedings of the 9th International Symposium - 2022 on “Socio-Economic Development through Arabic and Islamic Studies”. 28th September 2022. South Eastern University of Sri Lanka, University Park, Oluvil, Sri Lanka. pp. 511.	en_US
dc.identifier.issn	978-624-5736-55-3
dc.identifier.uri	http://ir.lib.seu.ac.lk/handle/123456789/6463
dc.description.abstract	People have converged on a worldwide level because of advancements in communication technologies. They are critical in ensuring freedom of speech by allowing individuals to express their thoughts, behaviors, and opinions openly. Although this presents an excellent chance for racism, trolling, and exposure to a flood of offensive online content. As a result, the exponential growth of hate speech on social media significantly impacts society. In this research, we applied machine learning and deep learning algorithms to detect hate speech and compared the performances of those algorithms to develop an ensemble model. Researchers collected and combined two Tamil languages hate speech tweets datasets created by Bharathi Raja Chakravarthi et al. Tweets in this dataset are classified into two categories: not offensive and offensive. This dataset contains 10,129 tweets. Also, researchers selected six machine and deep learning algorithms for this study. Support Vector Machine (SVM), Logistic Regression (LR), Naïve Bayes (NB), Bidirectional LSTM, Multi-layer Perceptron (MLP) and Multilingual BERT were applied. Regarding detecting hate speech, SVM (82%) and LR (82%) have the best Accuracy. Furthermore, researchers developed two ensemble algorithms to construct the most efficient model. The first ensemble model was created by combining SVM, LR and NB and the second ensemble was developed using SVM and LR. Four algorithms, including the two ensemble models, obtained the same Accuracy. Therefore, the researchers compared the F1 score and found that the ensemble model 02 outperformed other classifiers. The findings of this research study are essential because these findings can be utilized as a model study for Tamil language hate speech to evaluate future research works using different machine learning algorithms for detecting hate speech more accurately and efficiently.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Faculty of Islamic Studies and Arabic Language, South Eastern University of Sri Lanka, University Park, Oluvil.	en_US
dc.subject	Machine Learning	en_US
dc.subject	Deep Learning	en_US
dc.subject	Algorithms	en_US
dc.subject	Hate Speech	en_US
dc.subject	Detection and Ensemble Model	en_US
dc.title	An ensemble approach to detect hate speech in Tamil tweets	en_US
dc.type	Article	en_US