Crosslinguistic Corpus Linguistics
Master-level course, University of Cologne, 2023
Co-taught with Maria Bardají i Farré
This master-level course introduces students to the principles and methods of crosslinguistic corpus linguistics, exploring how corpus-based approaches can be applied to study linguistic phenomena across different languages.
Course Overview
The course covers theoretical foundations and practical applications of corpus linguistics in a crosslinguistic context, providing students with hands-on experience in corpus compilation, annotation, and analysis across multiple languages. Students engage with both theoretical readings and practical software tools including ELAN, INCEpTION, and R.
Structure, Readings and Content
Date | Content | Readings | Laptop/Programs Needed |
---|---|---|---|
05.04. | Introduction | - | - |
12.04. | What is corpus linguistics? | - | - |
19.04. | How to build a corpus and sampling issues | Biber (1993) – Representativeness Evert (2006) – The Library Metaphor | - |
26.04. | Building corpora of smaller languages | 1) Seifart (2008) – Representativeness of language documentation 2) McEnery & Ostler (2000) – A new agenda for corpus linguistics | - |
03.05. | Types of corpora | Gatto (2014) – The Web as corpus (Chapter 2) | Laptop |
10.05. | Corpus annotation | Beck et al. (2020) – Representation Problems | - |
17.05. | Corpus annotation | Blache et al. (2017) – The corpus of interactional data | Laptop |
24.05. | Comparable and parallel corpora. Corpus-based typology | 1) Haig, Schnell & Wegener (2012) – Comparing corpora from endangered language projects 2) Levshina (2017) – Parallel corpus of film subtitles | - |
31.05. | No lecture | - | - |
07.06. | Hands-on session: ELAN | - | Laptop + headphones; Install ELAN |
14.06. | Corpus linguistics for language documentation and grammar writing | 1) Cox (2011) – Corpus linguistics and language documentation 2) Mosel (2014) – Corpus ling. and documentary approaches | - |
21.06. | Hands-on session: INCEpTION | - | Laptop; Install INCEpTION |
28.06. | Hands-on session: R | - | Laptop; Install R and RStudio |
05.07. | Hands-on session: R | - | Laptop |
12.07. | No lecture | - | - |
Learning Objectives
- Understand the theoretical foundations of crosslinguistic corpus linguistics
- Learn practical skills in corpus compilation and management for diverse languages
- Develop competency in corpus annotation tools (ELAN, INCEpTION)
- Get acquainted with statistical analysis techniques using R for corpus data
- Apply corpus methods to investigate crosslinguistic phenomena
- Critically evaluate crosslinguistic corpus studies
- Design and implement corpus-based research projects
Key Tools and Software
- ELAN: For multimedia annotation and time-aligned transcription
- INCEpTION: For collaborative text annotation and machine learning-assisted annotation
- R and RStudio: For statistical analysis and data visualization of corpus data