SEUIR Repository

Towards stop words identification in Tamil text clustering

Show simple item record

dc.contributor.author Faathima Fayaza, M. S.
dc.contributor.author Fathima Farhath, F.
dc.date.accessioned 2022-02-21T10:50:04Z
dc.date.available 2022-02-21T10:50:04Z
dc.date.issued 2021
dc.identifier.citation International Journal of Advanced Computer Science and Applications, Vol. 12, No. 12, 2021; p. 524-529. en_US
dc.identifier.issn 2156-5570
dc.identifier.uri http://ir.lib.seu.ac.lk/handle/123456789/5994
dc.description.abstract Now-a-days, digital documents have become the primary source of information. Therefore, natural language processing is widely utilized in information retrieval, topic modeling, document classification, and document clustering. Preprocessing plays a significant role in all of these applications. One of the critical steps in preprocessing is removing stopwords. Many languages have defined their list of stopwords. However, a publicly available stopwords list isn't available for the Tamil language since it is under-resourced. This study identified 93 general and some domain-specific stopwords for sports, entertainment, local and foreign news by analyzing more than 1.7 million Tamil documents with more than 21 million words. Also, this study shows that removing stopwords improves the accuracy of a Tamil document clustering system. It showed an improvement of 2.4%, 0.95% in the F-score for TF-IDF with one pass algorithm and FastText with the one-pass algorithm, respectively. en_US
dc.language.iso en_US en_US
dc.publisher The Science and Information Organization en_US
dc.subject Stopwords en_US
dc.subject Tamil en_US
dc.subject Pre-processing en_US
dc.subject TF-IDF en_US
dc.subject Clustering en_US
dc.title Towards stop words identification in Tamil text clustering en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

  • Research Articles [915]
    THESE ARE RESEARCH ARTICLES OF ACADEMIC STAFF, PUBLISHED IN JOURNALS AND PROCEEDINGS ELSWHERE

Show simple item record

Search SEUIR


Advanced Search

Browse

My Account