Optimising multi-document summarisation for efficient digital library document retrieval

Anomija, V.; Akmal Jahan, M. A. C.

Optimising multi-document summarisation for efficient digital library document retrieval

Anomija, V.; Akmal Jahan, M. A. C.

URI: http://ir.lib.seu.ac.lk/handle/123456789/7490

Date: 2024-11-06

Abstract:

Finding pertinent material fast is becoming more and more challenging for researchers and students as a result of the research publications' rapid development into digital libraries. The problem is exacerbated by subscription-based platforms, which frequently offer restricted access to entire publications, forcing consumers to make their decisions only on abstracts. To address these challenges, this study explores multi-document summarization techniques aimed at improving document retrieval and relevance assessment. We evaluate Maximal Marginal Relevance (MMR), Centroid, and PageRank algorithms, assessing their performance on datasets such as Multi-XScience and a manually curated set using ROUGE metrics. The findings demonstrate that MMR has the best recall-to-precision ratio, making it an excellent choice for summarizing a variety of scientific document sets. Centroid is appropriate in situations when speed is of the essence since, although being marginally less accurate, it drastically cuts down on processing time. Even if it is less useful in this situation, PageRank nonetheless offers insightful information for ranking-based strategies. To facilitate more effective navigation of large datasets and enable users to make well-informed decisions, this study highlights the significance of incorporating these summarizing techniques into digital library systems. The findings optimize the trade-off between computing speed and summary quality, which advances current attempts to improve scientific knowledge retrieval.

Show full item record