Resumen:
|
[EN] Archives around the world hold vast digitized series of historical manuscript books or "bundles" containing, among others,
notarial records also known as "deeds" or "acts". One of the frst steps to provide metadata ...[+]
[EN] Archives around the world hold vast digitized series of historical manuscript books or "bundles" containing, among others,
notarial records also known as "deeds" or "acts". One of the frst steps to provide metadata which describe the contents of
those bundles is to segment them into their individual deeds. Even if deeds are often page-aligned, as in the bundles considered in the present work, this is a time-consuming task, often prohibitive given the huge scale of the manuscript series
involved. Unlike traditional Layout Analysis methods for page-level segmentation, our approach goes beyond the realm of a
single-page image, providing consistent deed detection results on full bundles. This is achieved in two tightly integrated steps:
frst, we estimate the class-posterior at the page level for the "initial", "middle", and ¿fnal¿ classes; then we "decode" these
posteriors applying a series of sequentiality consistency constraints to obtain a consistent book segmentation. Experiments are
presented for four large historical manuscripts, varying the number of "deeds" used for training. Two metrics are introduced
to assess the quality of book segmentation, one of them taking into account the loss of information entailed by segmentation
errors. The problem formalization, the metrics and the empirical work signifcantly extend our previous works on this topic.
[-]
|
Agradecimientos:
|
Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. Work partially supported by the
research grants: the SimancasSearch project as Grant PID2020-116813RB-I00a funded by MCIN/AEI/ ...[+]
Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. Work partially supported by the
research grants: the SimancasSearch project as Grant PID2020-116813RB-I00a funded by MCIN/AEI/ 10.13039/501100011033 and ValgrAI - Valencian Graduate School and Research Network of Artifcial Intelligence and the Generalitat Valenciana, co-funded by the European Union. The second author s work was partially supported by
the Universitat Politècnica de València under grant FPI-I/SP20190010. The third author s work is supported by a María Zambrano grant from the Spanish Ministerio de Universidades and the European Union NextGenerationEU/PRTR.
[-]
|