Syllabus

Author

Peter Gilles

1 Language Analytics in R

Seminar in the Doctoral School HSS at the University of Luxembourg | Summer semester 2026

Initial steps:

  • install R

  • install RStudio

  • install these packages in RStudio: tidyverse, languageR, tidytext, quanteda, quanteda.textmodels, quanteda.textplots, quanteda.textstats, quanteda.tidy, gutenbergr

1.1 Syllabus

Five sessions

  1. Introduction:
    1. Examples from own research
    2. First steps in Rstudio, packages and basic constructs (workflow, variables, scripting in Rmarkdown; writing good code)
    3. Work with text data and corpora (data import etc.)
    4. Hands-on exercises on toy data
  2. Data wrangling
    1. tidyverse
  3. Corpus analysis with tidytext and quanteda
  4. Visualisation and basic statistics with ggplot2
  5. Work on own projects

1.3 R cheatsheets

https://posit.co/resources/cheatsheets/

https://cran.r-project.org/doc/contrib/Short-refcard.pdf

1.4 Stilometrics

https://computationalstylistics.github.io/

1.5 AI tools

querychat

References

Levshina, Natalia. 2015. How to Do Linguistics with R: Data Exploration and Statistical Analysis. Amsterdam: John Benjamins Publishing Company. https://doi.org/10.1075/z.195;
Schneider, Gerold. 2024. Text analytics for corpus linguistics and digital humanities: simple R scripts and tools. Language, data science and digital humanities. London: Bloomsbury.
Winter, Bodo. 2020. Statistics for Linguists: An Introduction Using r. London: Routledge. https://www.taylorfrancis.com/books/9781315165547.