OCR-D Coordination Project - Project details

Coordinated funding initiative for the further development of optical character recognition (OCR) processes Phase 3





The aim of the project ‘OCR-D’ is to prepare conceptually and technically the full-text transformation of historical prints from the 16th to 18th centuries. For this purpose, an open source OCR-D software was developed, which breaks down the process of full-text recognition into individual steps, so that the optimal workflow for each print to be processed can be created, in order to generate scientifically usable full texts.

In phase 3, the software is to be stabilized and its application initiated in mass digitization. The software is to become more user-friendly and scalable. To achieve these goals, a coordinating committee is needed to coordinate the uniform development of interfaces and various components of the OCR-D software, as well as cross-project implementation details such as load balancing, deployment, and data management.

The coordination is carried out by five institutions (Herzog August Bibliothek, Berlin-Brandenburg Academy of Sciences and Humanities, Staatsbibliothek zu Berlin, Göttingen State and University Library and Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG)), which contribute with different focal points. Göttingen State and University Library, together with the GWDG, contributes in particular to quality assurance, optimization, securing the repositories and making the software permanent.



Project board at the SUB Göttingen

Project staff at the SUB Göttingen

SUB Göttingen departments / units involved in the project