Abstract:
Plagiarism is known as illegal use of others’
part of work or whole work as one’s own in any field such as
art, poetry, literature, cinema, research and other creative
forms of study. Plagiarism is one of the important issues in
academic and research fields and giving more concern in
academic systems. The situation is even worse with the
availability of ample resources on the web. This paper focuses
on an effective plagiarism detection tool on identifying suitable
intra-corpal plagiarism detection for text based assignments by
comparing unigram, bigram, trigram of vector space model
with cosine similarity measure. Manually evaluated, labelled
dataset was tested using unigram, bigram and trigram vector.
Even though trigram vector consumes comparatively more
time, it shows better results with the labelled data. In addition,
the selected trigram vector space model with cosine similarity
measure is compared with tri-gram sequence matching
technique with Jaccard measure. In the results, cosine
similarity score shows slightly higher values than the other.
Because, it focuses on giving more weight for terms that do not
frequently exist in the dataset and cosine similarity measure
using trigram technique is more preferable than the other.
Therefore, we present our new tool and it could be used as an
effective tool to evaluate text based electronic assignments and
minimize the plagiarism among students.