Konferenzbeitrag

Using Full Text Indices for Querying Spoken Language Data

As a part of the ZuMult-project, we are currently modelling a backend architecture that should provide query access to corpora from the Archive of Spoken German (AGD) at the Leibniz-Institute for the German Language (IDS). We are exploring how to reuse existing search engine frameworks providing full text indices and allowing to query corpora by one of the corpus query languages (QLs) established and actively used in the corpus research community. For this purpose, we tested MTAS - an open source Lucene-based search engine for querying on text with multilevel annotations. We applied MTAS on three oral corpora stored in the TEI-based ISO standard for transcriptions of spoken language (ISO 24624:2016). These corpora differ from the corpus data that MTAS was developed for, because they include interactions with two and more speakers and are enriched, inter alia, with timeline-based annotations. In this contribution, we report our test results and address issues that arise when search frameworks originally developed for querying written corpora are being transferred into the field of spoken language.

Urheber*in: Frick, Elena; Schmidt, Thomas

Namensnennung - Nicht kommerziell 4.0 International

Sprache: Englisch

Thema

Korpus <Linguistik>

Ereignis

Geistige Schöpfung

(wer)

Frick, Elena
Schmidt, Thomas

(wann)

2020-05-12

Ereignis

Veröffentlichung

(wer)

Paris : European Language Resources Association

URN: urn:nbn:de:bsz:mh39-98143

Letzte Aktualisierung: 14.09.2023, 08:26 MESZ

Datenpartner

Leibniz-Institut für Deutsche Sprache - Bibliothek

Original beim Datenpartner anzeigen

Objekttyp

Konferenzbeitrag

Beteiligte

Frick, Elena
Schmidt, Thomas
Paris : European Language Resources Association

Entstanden

2020-05-12

Ähnliche Objekte (12)

Konferenzbeitrag

EXMARaLDA and the FOLK tools – two toolsets for transcribing and annotating spoken language

Beitrag zu einem Periodikum

EXMARaLDA - Creating, Analysing and Sharing Spoken Language Corpora for Pragmatic Research

Buchbeitrag

Querying Interaction Structure: Approaches to Overlap in Spoken Language Corpora

Artikel

Accessing spoken language corpora: an overview of current approaches

Artikel

Gesprochene Lernerkorpora: Methodisch-technische Aspekte der Erhebung, Erschließung und Nutzung

Konferenzbeitrag

Creating and working with spoken language corpora in EXMARaLDA

Konferenzbeitrag

The database for spoken German - DGD2

Konferenzbeitrag

FOLK-Gold ― A gold standard for part-of-speech-tagging of spoken German

Buchbeitrag

Reconstruction of separable particle verbs in a corpus of spoken German

Beitrag zu einem Periodikum

Linguistic tool development between community practices and technology standards

Artikel

Construction and dissemination of a corpus of spoken interaction - tools and workflows in the FOLK project

Beitrag zu einem Periodikum

New and future developments in EXMARaLDA

Konferenzbeitrag

EXMARaLDA and the FOLK tools – two toolsets for transcribing and annotating spoken language

Beitrag zu einem Periodikum

EXMARaLDA - Creating, Analysing and Sharing Spoken Language Corpora for Pragmatic Research

Buchbeitrag

Querying Interaction Structure: Approaches to Overlap in Spoken Language Corpora

Artikel

Accessing spoken language corpora: an overview of current approaches

Artikel

Gesprochene Lernerkorpora: Methodisch-technische Aspekte der Erhebung, Erschließung und Nutzung

Konferenzbeitrag

Creating and working with spoken language corpora in EXMARaLDA

Konferenzbeitrag

The database for spoken German - DGD2

Konferenzbeitrag

FOLK-Gold ― A gold standard for part-of-speech-tagging of spoken German

Buchbeitrag

Reconstruction of separable particle verbs in a corpus of spoken German

Beitrag zu einem Periodikum

Linguistic tool development between community practices and technology standards

Artikel

Construction and dissemination of a corpus of spoken interaction - tools and workflows in the FOLK project

Beitrag zu einem Periodikum

New and future developments in EXMARaLDA

Konferenzbeitrag

EXMARaLDA and the FOLK tools – two toolsets for transcribing and annotating spoken language

Beitrag zu einem Periodikum

EXMARaLDA - Creating, Analysing and Sharing Spoken Language Corpora for Pragmatic Research

Buchbeitrag

Querying Interaction Structure: Approaches to Overlap in Spoken Language Corpora

Artikel

Accessing spoken language corpora: an overview of current approaches

Artikel

Gesprochene Lernerkorpora: Methodisch-technische Aspekte der Erhebung, Erschließung und Nutzung

Konferenzbeitrag

Creating and working with spoken language corpora in EXMARaLDA

Konferenzbeitrag

The database for spoken German - DGD2

Konferenzbeitrag

FOLK-Gold ― A gold standard for part-of-speech-tagging of spoken German

Buchbeitrag

Reconstruction of separable particle verbs in a corpus of spoken German

Beitrag zu einem Periodikum

Linguistic tool development between community practices and technology standards

Artikel

Construction and dissemination of a corpus of spoken interaction - tools and workflows in the FOLK project

Beitrag zu einem Periodikum

New and future developments in EXMARaLDA

Benutzerkonto anlegen

Informationen zur Registrierung von Kultur- und Wissenseinrichtungen finden Sie hier.

Felder mit * müssen ausgefüllt werden.

Benutzername*

Bitte geben Sie Ihren Benutzernamen ein

E-Mail*

Bitte geben Sie Ihre E-Mail ein

Bitte füllen Sie dieses Feld nicht aus

Vorname

Nachname

Passwort*

Bitte geben Sie Ihr Passwort ein

Passwort bestätigen*

Bitte geben Sie das gleiche Passwort ein

* Ich habe die Nutzungsbedingungen und die Datenschutzerklärung zur Erhebung persönlicher Daten gelesen und stimme ihnen zu.

Dieses Feld ist ein Pflichtfeld.

Ich möchte den Newsletter der Deutschen Digitalen Bibliothek abonnieren. Siehe Informationen zum Newsletter-Abonnement.

Benutzerkonto angelegt

Ihr „Meine DDB“-Konto wurde erfolgreich angelegt. Bevor Sie sich in Ihrem Konto anmelden können, müssen Sie auf den Bestätigungslink in der Nachricht klicken, die wir gerade an die von Ihnen angegebene E-Mail-Adresse geschickt haben

Die Kultursuchmaschine

Using Full Text Indices for Querying Spoken Language Data

Download

Angaben zum Objekt

Klassifikation und Themen

Beteiligte, Orts- und Zeitangaben

Weitere Informationen

Datenpartner

Objekttyp

Beteiligte

Entstanden

Ähnliche Objekte (12)

EXMARaLDA and the FOLK tools – two toolsets for transcribing and annotating spoken language

EXMARaLDA - Creating, Analysing and Sharing Spoken Language Corpora for Pragmatic Research

Querying Interaction Structure: Approaches to Overlap in Spoken Language Corpora

Accessing spoken language corpora: an overview of current approaches

Gesprochene Lernerkorpora: Methodisch-technische Aspekte der Erhebung, Erschließung und Nutzung

Creating and working with spoken language corpora in EXMARaLDA

The database for spoken German - DGD2

FOLK-Gold ― A gold standard for part-of-speech-tagging of spoken German

Reconstruction of separable particle verbs in a corpus of spoken German

Linguistic tool development between community practices and technology standards

Construction and dissemination of a corpus of spoken interaction - tools and workflows in the FOLK project

New and future developments in EXMARaLDA

EXMARaLDA and the FOLK tools – two toolsets for transcribing and annotating spoken language

EXMARaLDA - Creating, Analysing and Sharing Spoken Language Corpora for Pragmatic Research

Querying Interaction Structure: Approaches to Overlap in Spoken Language Corpora

Accessing spoken language corpora: an overview of current approaches

Gesprochene Lernerkorpora: Methodisch-technische Aspekte der Erhebung, Erschließung und Nutzung

Creating and working with spoken language corpora in EXMARaLDA

The database for spoken German - DGD2

FOLK-Gold ― A gold standard for part-of-speech-tagging of spoken German

Reconstruction of separable particle verbs in a corpus of spoken German

Linguistic tool development between community practices and technology standards

Construction and dissemination of a corpus of spoken interaction - tools and workflows in the FOLK project

New and future developments in EXMARaLDA

EXMARaLDA and the FOLK tools – two toolsets for transcribing and annotating spoken language

EXMARaLDA - Creating, Analysing and Sharing Spoken Language Corpora for Pragmatic Research

Querying Interaction Structure: Approaches to Overlap in Spoken Language Corpora

Accessing spoken language corpora: an overview of current approaches

Gesprochene Lernerkorpora: Methodisch-technische Aspekte der Erhebung, Erschließung und Nutzung

Creating and working with spoken language corpora in EXMARaLDA

The database for spoken German - DGD2

FOLK-Gold ― A gold standard for part-of-speech-tagging of spoken German

Reconstruction of separable particle verbs in a corpus of spoken German

Linguistic tool development between community practices and technology standards

Construction and dissemination of a corpus of spoken interaction - tools and workflows in the FOLK project

New and future developments in EXMARaLDA

Verbundene Objekte