SEUIR Repository

Implementation of parallel AHO-Corasick algorithm in Python

Show simple item record

dc.contributor.author Rushda, M. U. F.
dc.contributor.author Niruthika, J.
dc.contributor.author Pranavan, S.
dc.contributor.author Thambawita, D. R. V. L. B
dc.date.accessioned 2019-11-26T11:15:35Z
dc.date.available 2019-11-26T11:15:35Z
dc.date.issued 2019-11-27
dc.identifier.citation 9th International Symposium 2019 on “Promoting Multidisciplinary Academic Research and Innovation”. 27th - 28th November 2019. South Eastern University of Sri Lanka, University Park, Oluvil, Sri Lanka. pp. 222-233. en_US
dc.identifier.isbn 978-955-627-189-8
dc.identifier.uri http://ir.lib.seu.ac.lk/handle/123456789/3939
dc.description.abstract Python is a programming language that has experienced tremendous growth in its developer community over recent years. Several reasons can be attributed to this including the ease of use and readability. Data processing is one crucial area where python is popular. It is the second most used language for data analysis. Currently, many computer science applications are becoming more and more data-oriented, especially with the advent of machine learning. Therefore, performance improvement in the data processing field should be looked upon keenly. Multithreading is one crucial way to performance improvement in the data processing. Hence, many languages, including python, support multithreading. However, the multithreading module of CPython, the reference implementation of python, suffers controversy due to GIL (Global interpreter lock) which prevents two threads from entering the same section of code simultaneously. While this is advantageous for I/O bound programs, it is considered as the performance hit for CPU bound programs. In this research, the performance of the parallel version of the Aho-Corasick algorithm is compared against its serial version. Aho-Corasick is a widely used algorithm which addresses the problem of exact string matching, which is one of a significant problem in the computer science domain. The parallelising technique used is the division of input text among threads. Here, we have implemented the serial and parallel versions of AhoCorasick in python and found the realtime elapsed for the parallel portion of the algorithm. The input text size is varied, and graphs are drawn to interpret the result. Also, we have used benchmark implementations of parallel and serial versions of the algorithm used for python. The results show that the time performance of parallel Aho-Corasick algorithm is lower than the serial due to GIL (Global interpreter lock), while the time performance of the benchmark implementation of the parallel version is higher than its serial version. en_US
dc.language.iso en_US en_US
dc.publisher South Eastern University of Sri Lanka, University Park, Oluvil, Sri Lanka en_US
dc.subject Python en_US
dc.subject Aho-Corasick en_US
dc.subject Parallel processing en_US
dc.subject Multithreading en_US
dc.title Implementation of parallel AHO-Corasick algorithm in Python en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search SEUIR


Advanced Search

Browse

My Account