Search Results

Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications

Author: Zielinski, Andrea; Mutschke, Peter

Published: 2018

Publisher: DEU

In this paper, we describe our effort to create a new corpus for the evaluation of detecting and linking so-called survey variables in social science publications (e.g., "Do you believe in Heaven?"). The task is to recognize survey variable mentions... more

Full text:	https://www.ssoar.info/ssoar/handle/document/57723
Link for citation:	http://nbn-resolving.org/urn:nbn:de:0168-ssoar-57723-2

In this paper, we describe our effort to create a new corpus for the evaluation of detecting and linking so-called survey variables in social science publications (e.g., "Do you believe in Heaven?"). The task is to recognize survey variable mentions in a given text, disambiguate them, and link them to the corresponding variable within a knowledge base. Since there are generally hundreds of candidates to link to and due to the wide variety of forms they can take, this is a challenging task within NLP. The contribution of our work is the first gold standard corpus for the variable detection and linking task. We describe the annotation guidelines and the annotation process. The produced corpus is multilingual - German and English - and includes manually curated word and phrase alignments. Moreover, it includes text samples that could not be assigned to any variables, denoted as negative examples. Based on the new dataset, we conduct an evaluation of several state-of-the-art text classification and textual similarity methods. The annotated corpus is made available along with an open-source baseline system for variable mention identification and linking.

Export to reference management software

Source:	BASE Selection for Comparative Literature
Language:	Undetermined
Media type:	Conference object
Format:	Online
Parent title:	Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC) ; International Conference on Language Resources and Evaluation (LREC) ; 11
DDC Categories:	800; 070
Subjects:	Publizistische Medien; Journalismus,Verlagswesen; Literatur; Rhetorik; Literaturwissenschaft; News media; journalism; publishing; Literature; rhetoric and criticism; text mining; semantic textual similarity; paraphrase detection; linking; Informationswissenschaft; Sprachwissenschaft; Linguistik; Information Science; Science of Literature; Linguistics; Sozialwissenschaft; Publikation; Daten; Algorithmus; Computerlinguistik; social science; publication; data; algorithm; computational linguistics
Rights:	Creative Commons - Namensnennung, Nicht kommerz., Keine Bearbeitung 4.0 ; Creative Commons - Attribution-Noncommercial-No Derivative Works 4.0 ; info:eu-repo/semantics/openAccess

Un Canon littéraire européen? : Actes du colloque international de Bonn des 26, 27 et 28 mars 2014

Published: 2017

Publisher: Cultures européennes – identité européenne

Bibliographic information
Access

Link for citation:

https://hdl.handle.net/20.500.11811/702

Export to reference management software

Source:	BASE Selection for Comparative Literature
Language:	French
Media type:	Conference object
Format:	Online
DDC Categories:	800; 840
Rights:	In Copyright ; rightsstatements.org/vocab/InC/1.0/ ; openAccess

Machine-readable text corpora and the linguistic description of languages

Author: Mair, Christian

Published: 2017

Publisher: DEU ; Mannheim

"To understand the role of machine-readable text corpora in linguistics it is necessary to consider the four possible sources of data for the linguist, viz. (1) the analyst's own introspection/ intuition, (2) more or less systematically conducted... more

Full text:	http://www.ssoar.info/ssoar/handle/document/49753
Link for citation:	http://nbn-resolving.org/urn:nbn:de:0168-ssoar-49753-1

"To understand the role of machine-readable text corpora in linguistics it is necessary to consider the four possible sources of data for the linguist, viz. (1) the analyst's own introspection/ intuition, (2) more or less systematically conducted elicitation experiments with groups of native speakers of the language studied, (3) collections of authentic spoken or written citations gathered unsystematically, and (4) evidence extracted systematically from a well-defined corpus of texts. After a discussion of the advantages and disadvantages of the various sources of data, I will briefly exemplify recent advances made in the corpus-based description of languages that have become possible as a result of the application of computer technology to linguistics and then go on to present the major databases currently available for the study of English and German." (author's abstract)

Export to reference management software

RIS file
BibTeX file

Source:	BASE Selection for Comparative Literature
Language:	Undetermined
Media type:	Article (edited volume); Article (journal); Conference object
Format:	Online
Parent title:	Text analysis and computers ; 1 ; ZUMA-Nachrichten Spezial ; 64-75 ; Text Analysis and Computers Conference
DDC Categories:	800
Subjects:	Literatur; Rhetorik; Literaturwissenschaft; Literature; rhetoric and criticism; Sprachwissenschaft; Linguistik; Science of Literature; Linguistics; Textanalyse; Sprache; Computerlinguistik; Datengewinnung; text analysis; language; computational linguistics; data capture
Rights:	Deposit Licence - Keine Weiterverbreitung, keine Bearbeitung ; Deposit Licence - No Redistribution, No Modifications

Mining Social Science Publications for Survey Variables

Author: Zielinski, Andrea; Mutschke, Peter

Published: 2018

Publisher: MISC

Research in Social Science is usually based on survey data where individual research questions relate to observable concepts (variables). However, due to a lack of standards for data citations a reliable identification of the variables used is often... more

Full text:	https://www.ssoar.info/ssoar/handle/document/57722 http://www.aclweb.org/anthology/W17-2907
Link for citation:	http://nbn-resolving.org/urn:nbn:de:0168-ssoar-57722-7

Research in Social Science is usually based on survey data where individual research questions relate to observable concepts (variables). However, due to a lack of standards for data citations a reliable identification of the variables used is often difficult. In this paper, we present a work-in-progress study that seeks to provide a solution to the variable detection task based on supervised machine learning algorithms, using a linguistic analysis pipeline to extract a rich feature set, including terminological concepts and similarity metric scores. Further, we present preliminary results on a small dataset that has been specifically designed for this task, yielding modest improvements over the baseline.

Export to reference management software

Source:	BASE Selection for Comparative Literature
Language:	Undetermined
Media type:	Conference object
Format:	Online
Parent title:	Proceedings of the Second Workshop on NLP and Computational Social Science ; 47-52
DDC Categories:	800; 070
Subjects:	Literatur; Rhetorik; Literaturwissenschaft; Publizistische Medien; Journalismus,Verlagswesen; Literature; rhetoric and criticism; News media; journalism; publishing; OpenMinTed; Information Science; Science of Literature; Linguistics; Sprachwissenschaft; Linguistik; Informationswissenschaft; publication; technical literature; artificial intelligence; computational linguistics; survey; social science; concept; algorithm; periodical; construction of indicators; data capture; Datengewinnung; künstliche Intelligenz; Begriff; Algorithmus; Computerlinguistik; Befragung; Publikation; Sozialwissenschaft; Fachliteratur; Indikatorenbildung; Zeitschrift
Rights:	Creative Commons - Namensnennung, Nicht-kommerz., Weitergabe unter gleichen Bedingungen 4.0 ; Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 ; info:eu-repo/semantics/openAccess

Narrow Search

Search narrowed by

Type

Source

Format

Contributor

Media type

Language

Year

Last searches

Results for *

Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications

Un Canon littéraire européen? : Actes du colloque international de Bonn des 26, 27 et 28 mars 2014

Machine-readable text corpora and the linguistic description of languages

Mining Social Science Publications for Survey Variables

Contact us!