Crosslinguistic Corpus Linguistics
Master-level course, University of Cologne, 2023
Co-taught with Maria Bardají i Farré
This master-level course introduces students to the principles and methods of crosslinguistic corpus linguistics, exploring how corpus-based approaches can be applied to study linguistic phenomena across different languages.
Course Overview
The course covers theoretical foundations and practical applications of corpus linguistics in a crosslinguistic context, providing students with hands-on experience in corpus compilation, annotation, and analysis across multiple languages. Students engage with both theoretical readings and practical software tools including ELAN, INCEpTION, and R.
Structure, Readings and Content
| Date | Content | Readings | Laptop/Programs Needed | 
|---|---|---|---|
| 05.04. | Introduction | - | - | 
| 12.04. | What is corpus linguistics? | - | - | 
| 19.04. | How to build a corpus and sampling issues | Biber (1993) – Representativeness Evert (2006) – The Library Metaphor  | - | 
| 26.04. | Building corpora of smaller languages | 1) Seifart (2008) – Representativeness of language documentation 2) McEnery & Ostler (2000) – A new agenda for corpus linguistics  | - | 
| 03.05. | Types of corpora | Gatto (2014) – The Web as corpus (Chapter 2) | Laptop | 
| 10.05. | Corpus annotation | Beck et al. (2020) – Representation Problems | - | 
| 17.05. | Corpus annotation | Blache et al. (2017) – The corpus of interactional data | Laptop | 
| 24.05. | Comparable and parallel corpora. Corpus-based typology | 1) Haig, Schnell & Wegener (2012) – Comparing corpora from endangered language projects 2) Levshina (2017) – Parallel corpus of film subtitles  | - | 
| 31.05. | No lecture | - | - | 
| 07.06. | Hands-on session: ELAN | - | Laptop + headphones; Install ELAN | 
| 14.06. | Corpus linguistics for language documentation and grammar writing | 1) Cox (2011) – Corpus linguistics and language documentation 2) Mosel (2014) – Corpus ling. and documentary approaches  | - | 
| 21.06. | Hands-on session: INCEpTION | - | Laptop; Install INCEpTION | 
| 28.06. | Hands-on session: R | - | Laptop; Install R and RStudio | 
| 05.07. | Hands-on session: R | - | Laptop | 
| 12.07. | No lecture | - | - | 
Learning Objectives
- Understand the theoretical foundations of crosslinguistic corpus linguistics
 - Learn practical skills in corpus compilation and management for diverse languages
 - Develop competency in corpus annotation tools (ELAN, INCEpTION)
 - Get acquainted with statistical analysis techniques using R for corpus data
 - Apply corpus methods to investigate crosslinguistic phenomena
 - Critically evaluate crosslinguistic corpus studies
 - Design and implement corpus-based research projects
 
Key Tools and Software
- ELAN: For multimedia annotation and time-aligned transcription
 - INCEpTION: For collaborative text annotation and machine learning-assisted annotation
 - R and RStudio: For statistical analysis and data visualization of corpus data
 
