• Aucun résultat trouvé

Extraction of metadata related to "image" and "structure" of old documents

N/A
N/A
Protected

Academic year: 2021

Partager "Extraction of metadata related to "image" and "structure" of old documents"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-03167272

https://hal.archives-ouvertes.fr/hal-03167272

Submitted on 11 Mar 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Extraction of metadata related to ”image” and

”structure” of old documents

Maroua Mehri, Petra Gomez-Krämer, Pierre Héroux, Rémy Mullot

To cite this version:

Maroua Mehri, Petra Gomez-Krämer, Pierre Héroux, Rémy Mullot. Extraction of metadata related

to ”image” and ”structure” of old documents. Visual Recognition and Machine Learning Summer

School (VRML), Jul 2012, Grenoble, France. �hal-03167272�

(2)

Extraction of metadata related to "image" and "structure"

of old documents

Maroua MEHRI a,b , Petra GOMEZ-KRÄMER a , Pierre HÉROUX b and Rémy MULLOT a

{a}L3I, University of La Rochelle, France {b}LITIS, University of Rouen, France

{maroua.mehri, petra.gomez, remy.mullot}@univ-lr.fr and pierre.heroux@univ-rouen.fr

C ONTEXT

Our work is a part of the DIGIDOC project (Document Image diGitisa- tion with Interactive DescriptiOn Capability). DIGIDOC is funded by ANR (French National Research Agency).

O BJECTIVE

• Control the quality of old document image digitization

• Construct a computer-aided catego- rization tool of pages

• Characterize the document’s content using intermediate level metadata

• Provide a similarity measure between pages

1. Extraction of descriptors per block 2. Generation of topological signature

T EXTURE FEATURES

!

"

# $ % &

# % % '

(%)

( % )

*+$ ,- ,

"!-"!

,&-,&

!(- !(

(-(

We propose a feature vector composed of texture indices, all based on the autocorrela- tion function and the directional rose

[1,2]

.

C ONSENUS CLUSTERING

cluster number (k)

delta−K

0.0 0.1 0.2 0.3 0.4

2 3 4 5 6 7 8 9 10

agnes diana hclust kmeans merge pam

The optimal cluster number corresponds to the largest change in the area under the cu- mulative density curve ∆ k for the merge consensus matrix

[3]

.

O VERVIEW

! ! !

" #

$ %

! ! !

" #

$ %

$

&

$ ' (

)! '& (

' (

*

"

" !

Our goal is to propose a set of metadata char- acterizing the physical structure of pages in terms of homogeneous areas and topologi- cal relationships.

P ROPOSED APPROACH

!

" #

!

" #

$$$

!" #

$ " %

&

&&&

&&&

' !

(

We use non-parametric unsupervised clus- tering for the computed features to deter- mine the homogeneous regions defined by similar texture indices.

R ESULTS

Example of document segmentation. {(a), (d)} are the original images of the same book, {(b), (e)} are the results of the foreground pixel extraction using an adapted Hierarchical Ascendant Classification and {(c), (f)} are the final results of clustering steps.

R EFERENCES

[1] Journet, N. et al. Document image characteri- zation using a multiresolution analysis of the texture: application to old documents In IJ- DAR’08

[2] Ouji, A. et al. Chromatic/Achromatic Separa- tion in Noisy Document Images In ICDAR’11 [3] Simpson, T. et al. Merged consensus clus- tering to assess and improve class discovery with microarray data In BMC’10

F UTURE RESEARCH

• Validation of autocorrelation indices

• Computation of frequency and statisti- cal attributes and other texture features

• Definition of one or more signatures

for each page (graph of homogeneous

zones)

Références

Documents relatifs

Si certains travaux ont abordé le sujet des dermatoses à l’officine sur un plan théorique, aucun n’a concerné, à notre connaissance, les demandes d’avis

Later in the clinical course, measurement of newly established indicators of erythropoiesis disclosed in 2 patients with persistent anemia inappropriately low

Households’ livelihood and adaptive capacity in peri- urban interfaces : A case study on the city of Khon Kaen, Thailand.. Architecture,

3 Assez logiquement, cette double caractéristique se retrouve également chez la plupart des hommes peuplant la maison d’arrêt étudiée. 111-113) qu’est la surreprésentation

Et si, d’un côté, nous avons chez Carrier des contes plus construits, plus « littéraires », tendant vers la nouvelle (le titre le dit : jolis deuils. L’auteur fait d’une

la RCP n’est pas forcément utilisée comme elle devrait l’être, c'est-à-dire un lieu de coordination et de concertation mais elle peut être utilisée par certains comme un lieu

The change of sound attenuation at the separation line between the two zones, to- gether with the sound attenuation slopes, are equally well predicted by the room-acoustic diffusion

Using the Fo¨rster formulation of FRET and combining the PM3 calculations of the dipole moments of the aromatic portions of the chromophores, docking process via BiGGER software,