Ergebnisse für *

Zeige Ergebnisse 1 bis 4 von 4.

  1. Matching bibliographic data from publication lists with large databases using N-grams
    Erschienen: 2014
    Verlag:  KU Leuven, Fac. of Economics and Business, Leuven

    ZBW - Leibniz-Informationszentrum Wirtschaft, Standort Kiel
    keine Fernleihe
    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Hinweise zum Inhalt
    Quelle: Verbundkataloge
    Sprache: Englisch
    Medientyp: Buch (Monographie)
    Format: Online
    Schriftenreihe: MSI ; 1413
    Schlagworte: string matching; n-gram; edit distance; levenshtein distance; information retrieval
    Umfang: Online-Ressource (27 S.), graph. Darst.
  2. Inference for regression with variables generated from unstructured data
    Erschienen: May 2024
    Verlag:  CESifo, Munich, Germany

    The leading strategy for analyzing unstructured data uses two steps. First, latent variables of economic interest are estimated with an upstream information retrieval model. Second, the estimates are treated as “data” in a downstream econometric... mehr

    Zugang:
    Verlag (kostenfrei)
    Verlag (kostenfrei)
    ZBW - Leibniz-Informationszentrum Wirtschaft, Standort Kiel
    DS 63
    keine Fernleihe

     

    The leading strategy for analyzing unstructured data uses two steps. First, latent variables of economic interest are estimated with an upstream information retrieval model. Second, the estimates are treated as “data” in a downstream econometric model. We establish theoretical arguments for why this two-step strategy leads to biased inference in empirically plausible settings. More constructively, we propose a one-step strategy for valid inference that uses the upstream and downstream models jointly. The one-step strategy (i) substantially reduces bias in simulations; (ii) has quantitatively important effects in a leading application using CEO time-use data; and (iii) can be readily adapted by applied researchers.

     

    Export in Literaturverwaltung   RIS-Format
      BibTeX-Format
    Quelle: Verbundkataloge
    Sprache: Englisch
    Medientyp: Buch (Monographie)
    Format: Online
    Schriftenreihe: CESifo working papers ; 11119 (2024)
    Schlagworte: unstructured data; information retrieval; topic modeling; Hamiltonian Monte Carlo; measurement error
    Umfang: 1 Online-Ressource (circa 61 Seiten), Illustrationen
  3. „The Vectorian“ – Eine parametrisierbare Suchmaschine für intertextuelle Referenzen
    Erschienen: 2020
    Verlag:  Zenodo

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt AVL
    Sprache: Deutsch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Literatur und Rhetorik (800)
    Schlagworte: intertextuality; text reuse; information retrieval; NLP; word embeddings
    Lizenz:

    info:eu-repo/semantics/openAccess

  4. Into the bibliography jungle: using random forests to predict dissertations’ reference section

    Cited-works-lists in Humanities dissertations are typically the result of five years of work. However, despite the long-standing tradition of reference mining, no research has systematically untapped the bibliographic data of existing electronic... mehr

     

    Cited-works-lists in Humanities dissertations are typically the result of five years of work. However, despite the long-standing tradition of reference mining, no research has systematically untapped the bibliographic data of existing electronic thesis collections. One of the main reasons for this is the difficulty of creating a tagged gold standard for the around 300 pages long theses. In this short paper, we propose a page-based random forest (RF) prediction approach which uses a new corpus of Literary Studies Dissertations from Germany. Moreover, we will explain the handcrafted but computationally informed feature-selection process. The evaluation demonstrates that this method achieves an F1 score of 0.88 on this new dataset. In addition, it has the advantage of being derived from an interpretable model, where feature relevance for prediction is clear, and incorporates a simplified annotation process.

     

    Export in Literaturverwaltung
    Quelle: BASE Fachausschnitt AVL
    Sprache: Englisch
    Medientyp: Konferenzveröffentlichung
    Format: Online
    DDC Klassifikation: Literatur und Rhetorik (800)
    Schlagworte: electronic theses and dissertations; bibliographic reference parsing; information retrieval; machine learning
    Lizenz:

    info:eu-repo/semantics/openAccess