Abstract:
Plagiarism is a rapidly rising issue among students that occurs during the submission of assignments,
reports, and publications in universities and educational institutions because of the easy
accessibility of abundant e-resources on the Internet. To mitigate plagiarism among students,
many tools are available for natural language plagiarism detection. However, they become
inefficient when dealing with a prolific number of documents with large content due to the time they
consume. Therefore, we have proposed a way for software-based acceleration on text-based
plagiarism detection using a suitable model on CPU/GPU. For the evaluation on the CPU, initially, a
software-based serial vector space model was implemented on the CPU and tested with 1000 text-based documents particularly, students’ assignments, where it consumed 1641s for plagiarism
detection. As the computation time of plagiarism detection is a bottleneck of performance while
treating a prolific number of text-based sources with different sizes, we focus on accelerating and
optimizing the model with the number of documents. Therefore, this research intends to
implement and optimize the vector space model on the Graphics Processing Units (GPU) using
Compute Unified Device Architecture (CUDA). In order to speed up, a parallel version of the
model was developed on GPU using CUDA and tested with the same dataset which consumed only
36s and gained a 45x speedup compared to CPU, and when optimized further it took only 4s for
the same dataset which was 389x faster than serial implementation.