Konferenzbeitrag

Mining corpora of computer-mediated communication: analysis of linguistic features in Wikipedia talk pages using machine learning methods

Machine learning methods offer a great potential to automatically investigate large amounts of data in the humanities. Our contribution to the workshop reports about ongoing work in the BMBF project KobRA (http://www.kobra.tu-dortmund.de) where we apply machine learning methods to the analysis of big corpora in language-focused research of computer-mediated communication (CMC). At the workshop, we will discuss first results from training a Support Vector Machine (SVM) for the classification of selected linguistic features in talk pages of the German Wikipedia corpus in DeReKo provided by the IDS Mannheim. We will investigate different representations of the data to integrate complex syntactic and semantic information for the SVM. The results shall foster both corpus-based research of CMC and the annotation of linguistic features in CMC corpora.

Urheber*in: Beißwenger, Michael; Lüngen, Harald; Margaretha, Eliza; Pölitz, Christian

Attribution - ShareAlike 4.0 International

Language: Englisch

Subject

Korpus <Linguistik>

Event

Geistige Schöpfung

(who)

Beißwenger, Michael
Lüngen, Harald
Margaretha, Eliza
Pölitz, Christian

(when)

2014-11-03

Event

Veröffentlichung

(who)

Hildesheim : Universität Hildesheim

URN: urn:nbn:de:gbv:hil2-opus-2893

Last update: 14.09.2023, 8:26 AM CEST

Data provider

Leibniz-Institut für Deutsche Sprache - Bibliothek

Show original at data provider

Object type

Konferenzbeitrag

Associated

Beißwenger, Michael
Lüngen, Harald
Margaretha, Eliza
Pölitz, Christian
Hildesheim : Universität Hildesheim

Time of origin

2014-11-03

Other Objects (12)

Konferenzbeitrag

Text type structure and logical document structure

Artikel

Building linguistic corpora from Wikipedia articles and discussions

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Monografie

Mining corpora of computer-mediated communication: Analysis of linguistic features in Wikipedia talk pages using machine learning methods

Konferenzbeitrag

CMC Corpora in DeReKo

Buchbeitrag

New German words: detection and description

Konferenzbeitrag

Reply relations in CMC: types and annotation

Types and annotation of reply relations in computer-mediated communication

Artikel

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Buchbeitrag

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN

Buchbeitrag

Reply relations in CMC: types and annotation

Building linguistic corpora from Wikipedia articles and discussions

Konferenzbeitrag

Text type structure and logical document structure

Artikel

Building linguistic corpora from Wikipedia articles and discussions

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Monografie

Mining corpora of computer-mediated communication: Analysis of linguistic features in Wikipedia talk pages using machine learning methods

Konferenzbeitrag

CMC Corpora in DeReKo

Buchbeitrag

New German words: detection and description

Konferenzbeitrag

Reply relations in CMC: types and annotation

Types and annotation of reply relations in computer-mediated communication

Artikel

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Buchbeitrag

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN

Buchbeitrag

Reply relations in CMC: types and annotation

Building linguistic corpora from Wikipedia articles and discussions

Konferenzbeitrag

Text type structure and logical document structure

Artikel

Building linguistic corpora from Wikipedia articles and discussions

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Monografie

Mining corpora of computer-mediated communication: Analysis of linguistic features in Wikipedia talk pages using machine learning methods

Konferenzbeitrag

CMC Corpora in DeReKo

Buchbeitrag

New German words: detection and description

Konferenzbeitrag

Reply relations in CMC: types and annotation

Types and annotation of reply relations in computer-mediated communication

Artikel

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Buchbeitrag

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN

Buchbeitrag

Reply relations in CMC: types and annotation

Building linguistic corpora from Wikipedia articles and discussions

Create user account

Cultural heritage institutions wishing to register will find more information here.

Fields marked * need to be filled in.

Username*

Please enter your username

Email*

Please enter your email address

Please do not fill this field

First name

Last name

Password*

Please enter your password

Confirm password*

Please enter the same password

* I have read the terms of use and the privacy policy for the collection of personal data and accept them.

This field is required.

I would like to subscribe to the newsletter of the Deutsche Digitale Bibliothek. See newsletter subscription info.

Account created

Your "My DDB" account has been successfully created. Before you can log in to your account, you must click the confirmation link in the message we just sent to the email address you provided.

The culture search engine

Mining corpora of computer-mediated communication: analysis of linguistic features in Wikipedia talk pages using machine learning methods

Download

Object Details

Classification and Topics

Contributors, Location and Time

Further information

Data provider

Object type

Associated

Time of origin

Other Objects (12)

Text type structure and logical document structure

Building linguistic corpora from Wikipedia articles and discussions

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Mining corpora of computer-mediated communication: Analysis of linguistic features in Wikipedia talk pages using machine learning methods

CMC Corpora in DeReKo

New German words: detection and description

Reply relations in CMC: types and annotation

Types and annotation of reply relations in computer-mediated communication

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN

Reply relations in CMC: types and annotation

Building linguistic corpora from Wikipedia articles and discussions

Text type structure and logical document structure

Building linguistic corpora from Wikipedia articles and discussions

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Mining corpora of computer-mediated communication: Analysis of linguistic features in Wikipedia talk pages using machine learning methods

CMC Corpora in DeReKo

New German words: detection and description

Reply relations in CMC: types and annotation

Types and annotation of reply relations in computer-mediated communication

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN

Reply relations in CMC: types and annotation

Building linguistic corpora from Wikipedia articles and discussions

Text type structure and logical document structure

Building linguistic corpora from Wikipedia articles and discussions

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Mining corpora of computer-mediated communication: Analysis of linguistic features in Wikipedia talk pages using machine learning methods

CMC Corpora in DeReKo

New German words: detection and description

Reply relations in CMC: types and annotation

Types and annotation of reply relations in computer-mediated communication

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN

Reply relations in CMC: types and annotation

Building linguistic corpora from Wikipedia articles and discussions

Related objects