Please use this identifier to cite or link to this item: http://ir.lib.seu.ac.lk/handle/123456789/3939
Title: Implementation of parallel AHO-Corasick algorithm in Python
Authors: Rushda, M. U. F.
Niruthika, J.
Pranavan, S.
Thambawita, D. R. V. L. B
Keywords: Python
Aho-Corasick
Parallel processing
Multithreading
Issue Date: 27-Nov-2019
Publisher: South Eastern University of Sri Lanka, University Park, Oluvil, Sri Lanka
Citation: 9th International Symposium 2019 on “Promoting Multidisciplinary Academic Research and Innovation”. 27th - 28th November 2019. South Eastern University of Sri Lanka, University Park, Oluvil, Sri Lanka. pp. 222-233.
Abstract: Python is a programming language that has experienced tremendous growth in its developer community over recent years. Several reasons can be attributed to this including the ease of use and readability. Data processing is one crucial area where python is popular. It is the second most used language for data analysis. Currently, many computer science applications are becoming more and more data-oriented, especially with the advent of machine learning. Therefore, performance improvement in the data processing field should be looked upon keenly. Multithreading is one crucial way to performance improvement in the data processing. Hence, many languages, including python, support multithreading. However, the multithreading module of CPython, the reference implementation of python, suffers controversy due to GIL (Global interpreter lock) which prevents two threads from entering the same section of code simultaneously. While this is advantageous for I/O bound programs, it is considered as the performance hit for CPU bound programs. In this research, the performance of the parallel version of the Aho-Corasick algorithm is compared against its serial version. Aho-Corasick is a widely used algorithm which addresses the problem of exact string matching, which is one of a significant problem in the computer science domain. The parallelising technique used is the division of input text among threads. Here, we have implemented the serial and parallel versions of AhoCorasick in python and found the realtime elapsed for the parallel portion of the algorithm. The input text size is varied, and graphs are drawn to interpret the result. Also, we have used benchmark implementations of parallel and serial versions of the algorithm used for python. The results show that the time performance of parallel Aho-Corasick algorithm is lower than the serial due to GIL (Global interpreter lock), while the time performance of the benchmark implementation of the parallel version is higher than its serial version.
URI: http://ir.lib.seu.ac.lk/handle/123456789/3939
ISBN: 978-955-627-189-8
Appears in Collections:9th International Symposium - 2019

Files in This Item:
File Description SizeFormat 
26.pdf402.35 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.