HAL Id: hal-02155796
https://hal.archives-ouvertes.fr/hal-02155796
Submitted on 13 Jun 2019
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
ENTANGLED HISTORIES - MAKING ORDINANCES SEARCHABLE (±1500-1800)
Christel Annemieke Romein, Michel de Gruyter, Sara Floor Veldhoen
To cite this version:
ENTANGLED HISTORIES –
MAKING ORDINANCES SEARCHABLE (±1500-1800)
HISTORY – INSTITUTE FOR EARLY MODERN HISTORY/ GHENTCDH RESEARCH LAB
Christel Annemieke Romein
Contact
Annemieke.Romein@ugent.be/
What metadata (categories) is
added?
The Max-Planck-Institute für europäische Rechtsgeschichte created a hierarchical structure of categories within their
Policeygesetzgebung’s project (Karl
Härter, Michael Stolleis). These categories are going to be applied as much as
possible to enable international comparisons.
Improved searchability
Improving the searchability allows to
search for e.g. keywords, dates and titles. These can be exported and used to
search the books of ordinances more quickly and visualise output.
Aims of the Project:
This project
(1) improves the currently applied
Optical Character Recognition
(OCR)-technique to a much higher
recognition-standard with
Handwritten Text Recognition
(HTR).
(2) enhances readability by
systematically segmenting individual
texts, recognising text-sections –
beginning or end, columns, titles,
dates, summaries, the body of the
text.
(3) applies a standard categorisation
(metadata) with a machine-learned
algorithm.
Sources and Readability
This project uses the books of ordinances
(‘plakkaatboeken’) that were issued by the various governments within the Low Countries (North and South).
These have been digitised within the Google Books project but their OCR-quality is poor (est. 40% Character Error Rate (CER)).
This project aims to improve the
OCR-quality to a <5% CER in order to make them beter searchable
for researchers and computers.
The readability of the texts is improved by using Handwriting Text Recognition (HTR) tool Transkribus.
Relevance and implications
• Improved accessibility to a huge
amount of normative texts from the
early modern era, allowing longitudinal research to ‘common’ problems.
• A frequently used resource, now being placed in context (e.g. able to find
similar texts in other areas and periods) and accessible through metadata.
Groot Plakkaatboeken – Museum De Roos (Geertruidenberg)
http://www.museumderoos.nl/index.php?menuitemID=105&taalID=2
Project-team National Library (NL)
Michel de Gruijter – Project AdvisorAnnemieke Romein – Primary Investigator
Sara Veldhoen – Research Software Engineer
Underlying hypothesis
When problems arose, small ‘states’ had to act swiftly. Hence, I assume that they may have adopted – parts of – successful legislation from
neighbouring ‘states’. Hence ‘entangled histories’. Image Habsburg N etherlands: David Descamps https://commons.wikimedia.org/ wiki/File:Spanish_Netherlands.svg #/media/File:Spa nish_Netherlands.svg