Statistical Inference for the Linguistic and Non-Linguistic Past

Course time: 
Monday/Thursday 11:00 AM-12:50 PM

After the course, you (i) will be able to critically read current literature in linguistic phylogenetics, that is, the science of automatically constructing language-family trees from linguistic data, and in spatial analyses of linguistic data, (ii) will know how to perform phylogenetic and spatial analyses yourself applying already established methods, ranging from descriptive to Bayesian, and (iii) will be well positioned to get deeper background in computational statistics, phylogenetics and spatial statistics that is required to develop new and better methods for inference from linguistic data. 

Learning about past events from modern-day linguistic distributions has been one of the tasks of linguistics since the 19th century. However, both glottochronology and linguistic geography have been by the end of the 20th century largely dismissed as unreliable. Statistical inference seemed ultimately unsuitable for linguistic data because of the stochastic nature of linguistic change. 

Meanwhile, biologists have been facing very similar problems - for instance, rates of genetic mutation are far from constant just as rates of lexical change, and many evolutionary events leave very similar signatures in the biological data, making it hard to tease them apart. The new discipline of bioinformatics demonstrated that with enough resolve, a generous serving of math, and oftentimes with considerable computing power, such challenges can often be overcome. Over the last two decades, methods from bioinformatics have been increasingly entering into linguistics. 

This course is an introduction to the bioinformatic statistical methods applied to language. Its first half is on linguistic phylogenetics. The course provides a hands-on practical introduction to the state-of-the-art methods in the area, and specifically discusses when, and how much, they are reliable. The second half concerns spatial distributions of linguistic varieties. Spatial statistics for linguistic data is a less developed area than phylogenetics, but is promising to become more important in the next decade. After covering descriptive techniques, the course discusses the roads for moving from description to inference in this domain.