Corpus Linguistics

Course time: 
Tuesday/Friday 11:00 AM-12:50 PM
JSB 108

Corpus data is essential to many approaches to linguistics, including usage-based approaches to grammar, variationist sociolinguistics, and historical linguistics. Corpus building and evaluation have advanced tremendously over the past two decades but the barriers to constructing one’s own corpus can be daunting: annotation interfaces are difficult to learn, Natural Language Processing tools can be highly complex to work with and handling data requires more than basic computer skills. In this hands-on course we will learn to apply corpus methods to a dataset created during the course itself, focusing on the growing and challenging domain of social media. We will learn practical annotation schemes and consider how design choices impact our subsequent evaluation as we build and explore a small example corpus together.