Filtern nach
Letzte Suchanfragen

Ergebnisse für *

Zeige Ergebnisse 1 bis 4 von 4.

  1. “Shakespeare in the Vectorian Age”: An evaluation of different word embeddings and NLP parameters for the detection of Shakespeare quotes
    Erschienen: 2020
    Verlag:  International Committee on Computational Linguistics

    In this paper we describe an approach for the computer-aided identification of Shakespearean intertextuality in a corpus of contemporary fiction. We present the Vectorian, which is a framework that implements different word embeddings and various NLP... mehr

     

    In this paper we describe an approach for the computer-aided identification of Shakespearean intertextuality in a corpus of contemporary fiction. We present the Vectorian, which is a framework that implements different word embeddings and various NLP parameters. The Vectorian works like a search engine, i.e. a Shakespeare phrase can be entered as a query, the underlying collection of fiction books is then searched for the phrase and the passages that are likely to contain the phrase, either verbatim or as a paraphrase, are presented in a ranked results list. While the Vectorian can be used via a GUI, in which many different parameters can be set and combined manually, in this paper we present an ablation study that automatically evaluates different embedding and NLP parameter combinations against a ground truth. We investigate the behavior of different parameters during the evaluation and discuss how our results may be used for future studies on the detection of Shakespearean intertextuality.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt AVL
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Literatur und Rhetorik (800)
    Schlagworte: Vectorian; computer-aided identification
    Lizenz:

    info:eu-repo/semantics/openAccess

  2. „The Vectorian“ – Eine parametrisierbare Suchmaschine für intertextuelle Referenzen
    Erschienen: 2020
    Verlag:  Zenodo

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt AVL
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Literatur und Rhetorik (800)
    Schlagworte: intertextuality; text reuse; information retrieval; NLP; word embeddings
    Lizenz:

    info:eu-repo/semantics/openAccess

  3. From Historical Newspapers to Machine-Readable Data: The Origami OCR Pipeline
    Erschienen: 2020
    Verlag:  CEUR-WS.org

    While historical newspapers recently have gained a lot of attention in the digital humanities, transforming them into machine-readable data by means of OCR poses some major challenges. In order to address these challenges, we have developed an... mehr

     

    While historical newspapers recently have gained a lot of attention in the digital humanities, transforming them into machine-readable data by means of OCR poses some major challenges. In order to address these challenges, we have developed an end-to-end OCR pipeline named Origami. This pipeline is part of a current project on the digitization and quantitative analysis of the German newspaper “Berliner Börsen-Zeitung” (BBZ), from 1872 to 1931. The Origami pipeline reuses existing open source OCR components and on top offers a new configurable architecture for layout detection, a simple table recognition, a two-stage X-Y cut for reading order detection, and a new robust implementation for document dewarping. In this paper we describe the different stages of the workflow and discuss how they meet the above-mentioned challenges posed by historical newspapers.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt AVL
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Literatur und Rhetorik (800)
    Schlagworte: end-to-end OCR; historical newspapers; layout detection; deep neural networks
    Lizenz:

    info:eu-repo/semantics/openAccess

  4. The Vectorian API – A Research Framework for Semantic Textual Similarity (STS) Searches
    Erschienen: 2022
    Verlag:  DH2022 Local Organizing Committee

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt AVL
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Literatur und Rhetorik (800)
    Schlagworte: Semantic Textual Similarity (STS)
    Lizenz:

    info:eu-repo/semantics/openAccess