Ergebnisse für *

Zeige Ergebnisse 1 bis 2 von 2.

  1. Peeking Inside the DH Toolbox - Detection and Classification of Software Tools in DH Publications
    Erschienen: 2022
    Verlag:  CEUR-WS.org

    Digital tools have played an important role in Digital Humanities (DH) since its beginnings. Accordingly, a lot of research has been dedicated to the documentation of tools as well as to the analysis of their impact from an epistemological... mehr

     

    Digital tools have played an important role in Digital Humanities (DH) since its beginnings. Accordingly, a lot of research has been dedicated to the documentation of tools as well as to the analysis of their impact from an epistemological perspective. In this paper we propose a binary and a multi-class classification approach to detect and classify tools. The approach builds on state-of-the-art neural language models. We test our model on two different corpora and report the results for different parameter configurations in two consecutive experiments. In the end, we demonstrate how the models can be used for actual tool detection and tool classification tasks in a large corpus of DH journals.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt AVL
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Literatur und Rhetorik (800)
    Schlagworte: environmental humanities; computational literary studies; text mining; Ökologie; Biodiversität; Inhaltsanalyse; Literatur
    Lizenz:

    info:eu-repo/semantics/openAccess

  2. Into the bibliography jungle: using random forests to predict dissertations’ reference section

    Cited-works-lists in Humanities dissertations are typically the result of five years of work. However, despite the long-standing tradition of reference mining, no research has systematically untapped the bibliographic data of existing electronic... mehr

     

    Cited-works-lists in Humanities dissertations are typically the result of five years of work. However, despite the long-standing tradition of reference mining, no research has systematically untapped the bibliographic data of existing electronic thesis collections. One of the main reasons for this is the difficulty of creating a tagged gold standard for the around 300 pages long theses. In this short paper, we propose a page-based random forest (RF) prediction approach which uses a new corpus of Literary Studies Dissertations from Germany. Moreover, we will explain the handcrafted but computationally informed feature-selection process. The evaluation demonstrates that this method achieves an F1 score of 0.88 on this new dataset. In addition, it has the advantage of being derived from an interpretable model, where feature relevance for prediction is clear, and incorporates a simplified annotation process.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt AVL
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Literatur und Rhetorik (800)
    Schlagworte: electronic theses and dissertations; bibliographic reference parsing; information retrieval; machine learning
    Lizenz:

    info:eu-repo/semantics/openAccess