dc.contributor.author |
Jahan, M.A.C. Akmal |
|
dc.contributor.author |
Ragel, Roshan G. |
|
dc.date.accessioned |
2018-09-11T04:40:05Z |
|
dc.date.available |
2018-09-11T04:40:05Z |
|
dc.date.issued |
2014-12-22 |
|
dc.identifier.citation |
7th International Conference on "Information and Automation for Sustainability". 22nd-24th Dec, 2014. Colombo, Sri Lanka. |
en_US |
dc.identifier.issn |
2151-1802 |
|
dc.identifier.uri |
http://ir.lib.seu.ac.lk/handle/123456789/3126 |
|
dc.identifier.uri |
https://doi.org/10.1109/ICIAFS.2014.7069552 |
|
dc.description.abstract |
Pool of knowledge available to the mankind
depends on the source of learning resources, which can vary
from ancient printed documents to present electronic material.
The rapid conversion of material available in traditional
libraries to digital form needs a significant amount of work if
we are to maintain the format and the look of the electronic
documents as same as their printed counterparts. Most of the
printed documents contain not only characters and its
formatting but also some associated non text objects such as
tables, charts and graphical objects. It is challenging to detect
them and to concentrate on the format preservation of the
contents while reproducing them. To address this issue, we
propose an algorithm using local thresholds for word space and
line height to locate and extract all categories of tables from
scanned document images. From the experiments performed on
298 documents, we conclude that our algorithm has an overall
accuracy of about 75% in detecting tables from the scanned
document images. Since the algorithm does not completely
depend on rule lines, it can detect all categories of tables in a
range of scanned documents with different font types, styles
and sizes to extract their formatting features. Moreover, the
algorithm can be applied to locate tables in multi column
layouts with small modification in layout analysis. Treating
tables with their existing formatting features will tremendously
help the reproducing of printed documents for reprinting and
updating purposes. |
en_US |
dc.language.iso |
en_US |
en_US |
dc.publisher |
IEEE |
en_US |
dc.subject |
OCR-optical character recognition |
en_US |
dc.subject |
Table detection |
en_US |
dc.subject |
Format preservation |
en_US |
dc.title |
Locating tables in scanned documents for reconstructing and republishing |
en_US |
dc.type |
Article |
en_US |