Suchergebnisse

“Shakespeare in the Vectorian Age”: An evaluation of different word embeddings and NLP parameters for the detection of Shakespeare quotes

Autor*in: Liebl, Bernhard; Burghardt, Manuel

Erschienen: 2020

Verlag: International Committee on Computational Linguistics

In this paper we describe an approach for the computer-aided identification of Shakespearean intertextuality in a corpus of contemporary fiction. We present the Vectorian, which is a framework that implements different word embeddings and various NLP... mehr

Volltext:	https://ul.qucosa.de/id/qucosa%3A92150 https://ul.qucosa.de/api/qucosa%3A92150/attachment/ATT-0/
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:15-qucosa2-921501

In this paper we describe an approach for the computer-aided identification of Shakespearean intertextuality in a corpus of contemporary fiction. We present the Vectorian, which is a framework that implements different word embeddings and various NLP parameters. The Vectorian works like a search engine, i.e. a Shakespeare phrase can be entered as a query, the underlying collection of fiction books is then searched for the phrase and the passages that are likely to contain the phrase, either verbatim or as a paraphrase, are presented in a ranked results list. While the Vectorian can be used via a GUI, in which many different parameters can be set and combined manually, in this paper we present an ablation study that automatically evaluates different embedding and NLP parameter combinations against a ground truth. We investigate the behavior of different parameters during the evaluation and discuss how our results may be used for future studies on the detection of Shakespearean intertextuality.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt AVL
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Literatur und Rhetorik (800)
Schlagworte:	Vectorian; computer-aided identification
Lizenz:	info:eu-repo/semantics/openAccess

„The Vectorian“ – Eine parametrisierbare Suchmaschine für intertextuelle Referenzen

Autor*in: Liebl, Bernhard; Burghardt, Manuel

Erschienen: 2020

Verlag: Zenodo

Bibliographische Angaben
Zugang

Volltext:	https://ul.qucosa.de/id/qucosa%3A92158 https://ul.qucosa.de/api/qucosa%3A92158/attachment/ATT-0/
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:15-qucosa2-921583

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt AVL
Sprache:	Deutsch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Literatur und Rhetorik (800)
Schlagworte:	intertextuality; text reuse; information retrieval; NLP; word embeddings
Lizenz:	info:eu-repo/semantics/openAccess

From Historical Newspapers to Machine-Readable Data: The Origami OCR Pipeline

Autor*in: Liebl, Bernhard; Burghardt, Manuel

Erschienen: 2020

Verlag: CEUR-WS.org

While historical newspapers recently have gained a lot of attention in the digital humanities, transforming them into machine-readable data by means of OCR poses some major challenges. In order to address these challenges, we have developed an... mehr

Volltext:	https://ul.qucosa.de/id/qucosa%3A92168 https://ul.qucosa.de/api/qucosa%3A92168/attachment/ATT-0/
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:15-qucosa2-921689

While historical newspapers recently have gained a lot of attention in the digital humanities, transforming them into machine-readable data by means of OCR poses some major challenges. In order to address these challenges, we have developed an end-to-end OCR pipeline named Origami. This pipeline is part of a current project on the digitization and quantitative analysis of the German newspaper “Berliner Börsen-Zeitung” (BBZ), from 1872 to 1931. The Origami pipeline reuses existing open source OCR components and on top offers a new configurable architecture for layout detection, a simple table recognition, a two-stage X-Y cut for reading order detection, and a new robust implementation for document dewarping. In this paper we describe the different stages of the workflow and discuss how they meet the above-mentioned challenges posed by historical newspapers.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt AVL
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Literatur und Rhetorik (800)
Schlagworte:	end-to-end OCR; historical newspapers; layout detection; deep neural networks
Lizenz:	info:eu-repo/semantics/openAccess

The Vectorian API – A Research Framework for Semantic Textual Similarity (STS) Searches

Autor*in: Burghardt, Manuel; Liebl, Bernhard

Erschienen: 2022

Verlag: DH2022 Local Organizing Committee

Bibliographische Angaben
Zugang

Volltext:	https://ul.qucosa.de/id/qucosa%3A92317 https://ul.qucosa.de/api/qucosa%3A92317/attachment/ATT-0/
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:15-qucosa2-923175

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt AVL
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Literatur und Rhetorik (800)
Schlagworte:	Semantic Textual Similarity (STS)
Lizenz:	info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

“Shakespeare in the Vectorian Age”: An evaluation of different word embeddings and NLP parameters for the detection of Shakespeare quotes

„The Vectorian“ – Eine parametrisierbare Suchmaschine für intertextuelle Referenzen

From Historical Newspapers to Machine-Readable Data: The Origami OCR Pipeline

The Vectorian API – A Research Framework for Semantic Textual Similarity (STS) Searches

Kontaktieren Sie uns!