Crosslinguistic Corpus Linguistics

Master-level course, University of Cologne, 2023

Co-taught with Maria Bardají i Farré

This master-level course introduces students to the principles and methods of crosslinguistic corpus linguistics, exploring how corpus-based approaches can be applied to study linguistic phenomena across different languages.

Course Overview

The course covers theoretical foundations and practical applications of corpus linguistics in a crosslinguistic context, providing students with hands-on experience in corpus compilation, annotation, and analysis across multiple languages. Students engage with both theoretical readings and practical software tools including ELAN, INCEpTION, and R.

Structure, Readings and Content

DateContentReadingsLaptop/Programs Needed
05.04.Introduction--
12.04.What is corpus linguistics?--
19.04.How to build a corpus and sampling issuesBiber (1993) – Representativeness
Evert (2006) – The Library Metaphor
-
26.04.Building corpora of smaller languages1) Seifart (2008) – Representativeness of language documentation
2) McEnery & Ostler (2000) – A new agenda for corpus linguistics
-
03.05.Types of corporaGatto (2014) – The Web as corpus (Chapter 2)Laptop
10.05.Corpus annotationBeck et al. (2020) – Representation Problems-
17.05.Corpus annotationBlache et al. (2017) – The corpus of interactional dataLaptop
24.05.Comparable and parallel corpora. Corpus-based typology1) Haig, Schnell & Wegener (2012) – Comparing corpora from endangered language projects
2) Levshina (2017) – Parallel corpus of film subtitles
-
31.05.No lecture--
07.06.Hands-on session: ELAN-Laptop + headphones; Install ELAN
14.06.Corpus linguistics for language documentation and grammar writing1) Cox (2011) – Corpus linguistics and language documentation
2) Mosel (2014) – Corpus ling. and documentary approaches
-
21.06.Hands-on session: INCEpTION-Laptop; Install INCEpTION
28.06.Hands-on session: R-Laptop; Install R and RStudio
05.07.Hands-on session: R-Laptop
12.07.No lecture--

Learning Objectives

  • Understand the theoretical foundations of crosslinguistic corpus linguistics
  • Learn practical skills in corpus compilation and management for diverse languages
  • Develop competency in corpus annotation tools (ELAN, INCEpTION)
  • Get acquainted with statistical analysis techniques using R for corpus data
  • Apply corpus methods to investigate crosslinguistic phenomena
  • Critically evaluate crosslinguistic corpus studies
  • Design and implement corpus-based research projects

Key Tools and Software

  • ELAN: For multimedia annotation and time-aligned transcription
  • INCEpTION: For collaborative text annotation and machine learning-assisted annotation
  • R and RStudio: For statistical analysis and data visualization of corpus data