Syllabus
1 Language Analytics in R
Seminar in the Doctoral School HSS at the University of Luxembourg | Summer semester 2026
Initial steps:
install R
install RStudio
install these packages in RStudio: tidyverse, languageR, tidytext, quanteda, quanteda.textmodels, quanteda.textplots, quanteda.textstats, quanteda.tidy, gutenbergr
1.1 Syllabus
Five sessions
- Introduction:
- Examples from own research
- First steps in Rstudio, packages and basic constructs (workflow, variables, scripting in Rmarkdown; writing good code)
- Work with text data and corpora (data import etc.)
- Hands-on exercises on toy data
- Data wrangling
- tidyverse
- Corpus analysis with tidytext and quanteda
- Visualisation and basic statistics with ggplot2
- Work on own projects
1.2 Resources and recommended reading
(Winter 2020) (https://appliedstatisticsforlinguists.org/bwinter_stats_proofs.pdf)
https://www.tidytextmining.com/
https://eleanorchodroff.com/r-for-linguists/
https://jofrhwld.github.io/AandS500_2023/
https://jofrhwld.github.io/2023_Lin611/
1.3 R cheatsheets
1.4 Stilometrics
1.5 AI tools
References
Levshina, Natalia. 2015. How to Do Linguistics with R: Data Exploration and Statistical Analysis. Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/z.195;
Schneider, Gerold. 2024. Text analytics for corpus linguistics and digital humanities: simple R scripts and tools. Language, data science and digital humanities. London: Bloomsbury.
Winter, Bodo. 2020. Statistics for Linguists: An Introduction Using r. London: Routledge. https://www.taylorfrancis.com/books/9781315165547.