Suchergebnisse

Filtern nach

Letzte Suchanfragen

Ergebnisse für *

Zeige Ergebnisse 1 bis 1 von 1.

Relevanz

Titel

Typ

Autor

Datum

From Historical Newspapers to Machine-Readable Data: The Origami OCR Pipeline

Autor*in: Liebl, Bernhard; Burghardt, Manuel

Erschienen: 2020

Verlag: CEUR-WS.org

Volltext:	https://ul.qucosa.de/id/qucosa%3A92168 https://ul.qucosa.de/api/qucosa%3A92168/attachment/ATT-0/
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:15-qucosa2-921689

While historical newspapers recently have gained a lot of attention in the digital humanities, transforming them into machine-readable data by means of OCR poses some major challenges. In order to address these challenges, we have developed an end-to-end OCR pipeline named Origami. This pipeline is part of a current project on the digitization and quantitative analysis of the German newspaper “Berliner Börsen-Zeitung” (BBZ), from 1872 to 1931. The Origami pipeline reuses existing open source OCR components and on top offers a new configurable architecture for layout detection, a simple table recognition, a two-stage X-Y cut for reading order detection, and a new robust implementation for document dewarping. In this paper we describe the different stages of the workflow and discuss how they meet the above-mentioned challenges posed by historical newspapers.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt AVL
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Literatur und Rhetorik (800)
Schlagworte:	end-to-end OCR; historical newspapers; layout detection; deep neural networks
Lizenz:	info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

From Historical Newspapers to Machine-Readable Data: The Origami OCR Pipeline

Kontaktieren Sie uns!