Suchergebnisse

Matching bibliographic data from publication lists with large databases using N-grams

Autor*in: Abdulhayoglu, Mehmet Ali; Thijs, Bart; Jeuris, Wouter

Erschienen: 2014

Verlag: KU Leuven, Fac. of Economics and Business, Leuven

Kiel: ZBW - Leibniz-Informationszentrum Wirtschaft, Standort Kiel

Standort:

ZBW - Leibniz-Informationszentrum Wirtschaft, Standort Kiel

Fernleihe:

keine Fernleihe

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Hinweise zum Inhalt

Volltext

Quelle:	Verbundkataloge
Sprache:	Englisch
Medientyp:	Buch (Monographie)
Format:	Online
Schriftenreihe:	MSI ; 1413
Schlagworte:	string matching; n-gram; edit distance; levenshtein distance; information retrieval
Umfang:	Online-Ressource (27 S.), graph. Darst.

Inference for regression with variables generated from unstructured data

Autor*in: Battaglia, Laura; Christensen, Timothy; Hansen, Stephen; Sacher, Szymon

Erschienen: May 2024

Verlag: CESifo, Munich, Germany

The leading strategy for analyzing unstructured data uses two steps. First, latent variables of economic interest are estimated with an upstream information retrieval model. Second, the estimates are treated as “data” in a downstream econometric... mehr

Zugang:

Verlag (kostenfrei)

Kiel: ZBW - Leibniz-Informationszentrum Wirtschaft, Standort Kiel

Standort:

ZBW - Leibniz-Informationszentrum Wirtschaft, Standort Kiel

Signatur:

DS 63

Fernleihe:

keine Fernleihe

The leading strategy for analyzing unstructured data uses two steps. First, latent variables of economic interest are estimated with an upstream information retrieval model. Second, the estimates are treated as “data” in a downstream econometric model. We establish theoretical arguments for why this two-step strategy leads to biased inference in empirically plausible settings. More constructively, we propose a one-step strategy for valid inference that uses the upstream and downstream models jointly. The one-step strategy (i) substantially reduces bias in simulations; (ii) has quantitatively important effects in a leading application using CEO time-use data; and (iii) can be readily adapted by applied researchers.

Export in Literaturverwaltung

RIS-Format
BibTeX-Format

Quelle:	Verbundkataloge
Sprache:	Englisch
Medientyp:	Buch (Monographie)
Format:	Online
Schriftenreihe:	CESifo working papers ; 11119 (2024)
Schlagworte:	unstructured data; information retrieval; topic modeling; Hamiltonian Monte Carlo; measurement error
Umfang:	1 Online-Ressource (circa 61 Seiten), Illustrationen

„The Vectorian“ – Eine parametrisierbare Suchmaschine für intertextuelle Referenzen

Autor*in: Liebl, Bernhard; Burghardt, Manuel

Erschienen: 2020

Verlag: Zenodo

Bibliographische Angaben
Zugang

Volltext:	https://ul.qucosa.de/id/qucosa%3A92158 https://ul.qucosa.de/api/qucosa%3A92158/attachment/ATT-0/
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:15-qucosa2-921583

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt AVL
Sprache:	Deutsch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Literatur und Rhetorik (800)
Schlagworte:	intertextuality; text reuse; information retrieval; NLP; word embeddings
Lizenz:	info:eu-repo/semantics/openAccess

Into the bibliography jungle: using random forests to predict dissertations’ reference section

Autor*in: Gutiérrez De la Torre, Silvia E.; Niekler, Andreas; Equihua, Julián; Burghardt, Manuel

Erschienen: 2022

Verlag: CEUR-WS.org

Cited-works-lists in Humanities dissertations are typically the result of five years of work. However, despite the long-standing tradition of reference mining, no research has systematically untapped the bibliographic data of existing electronic... mehr

Volltext:	https://ul.qucosa.de/id/qucosa%3A92321 https://ul.qucosa.de/api/qucosa%3A92321/attachment/ATT-0/
Zitierfähiger Link:	https://nbn-resolving.org/urn:nbn:de:bsz:15-qucosa2-923215

Cited-works-lists in Humanities dissertations are typically the result of five years of work. However, despite the long-standing tradition of reference mining, no research has systematically untapped the bibliographic data of existing electronic thesis collections. One of the main reasons for this is the difficulty of creating a tagged gold standard for the around 300 pages long theses. In this short paper, we propose a page-based random forest (RF) prediction approach which uses a new corpus of Literary Studies Dissertations from Germany. Moreover, we will explain the handcrafted but computationally informed feature-selection process. The evaluation demonstrates that this method achieves an F1 score of 0.88 on this new dataset. In addition, it has the advantage of being derived from an interpretable model, where feature relevance for prediction is clear, and incorporates a simplified annotation process.

Export in Literaturverwaltung

Quelle:	BASE Fachausschnitt AVL
Sprache:	Englisch
Medientyp:	Konferenzveröffentlichung
Format:	Online
DDC Klassifikation:	Literatur und Rhetorik (800)
Schlagworte:	electronic theses and dissertations; bibliographic reference parsing; information retrieval; machine learning
Lizenz:	info:eu-repo/semantics/openAccess

Filtern nach

Aktive Filter

Kategorien:

Quelle

Format

Beteiligt

Medientyp

Sprache

Jahr

Letzte Suchanfragen

Ergebnisse für *

Matching bibliographic data from publication lists with large databases using N-grams

Kiel: ZBW - Leibniz-Informationszentrum Wirtschaft, Standort Kiel

Inference for regression with variables generated from unstructured data

Kiel: ZBW - Leibniz-Informationszentrum Wirtschaft, Standort Kiel

„The Vectorian“ – Eine parametrisierbare Suchmaschine für intertextuelle Referenzen

Into the bibliography jungle: using random forests to predict dissertations’ reference section

Kontaktieren Sie uns!