Event details

Date:
14/07/2025 - 18/07/2025
Time:
All Day
Venue:
Utrecht Science Park: Specific location TBA
USP, Utrecht
For:
For humanities researchers, For humanities students, For humanities teachers, Open to all

In this course, students will learn how to apply text mining and NLP methods on text data and analyse them in a pipeline with machine learning and deep learning algorithms. The course has a strongly practical hands-on focus, and students will gain experience in using text mining on real data from social sciences, humanities, and healthcare, and interpreting the results.

Given the rapid rate at which text data are being digitally gathered in many domains of science, there is a growing need for automated tools that can analyze, classify, and interpret these kinds of data. Text mining and NLP techniques can be applied to create a structured representation of text, making its content more accessible for researchers. Applications of text mining are everywhere: social media, web search, advertising, emails, customer service, healthcare, marketing, etc. This course offers an extensive exploration into text mining with Python. The course has a strongly practical hands-on focus, and students will gain experience in using text mining on real data from for example social sciences and healthcare and interpreting the results. Through lectures and practicals, the students will learn the necessary skills to design, implement, and understand their own text mining pipeline. The topics in this course include preprocessing text, text classification, topic modeling, word embedding, deep learning models, large language models, promoting, and responsible text mining.

This course works best for learners who are comfortable programming in Python, who want to acquire skills in text mining approaches, and who have a basic knowledge of machine learning.

Participants from computer science and related disciplines, as well as diverse fields such as sociology, psychology, education, medicine, statistics, and beyond, will benefit from the course. 

Data Science specialisation
This course can be taken separately, but is also part of a series of 8 courses in the Summer School Data Science specialisation taught by UU’s department of Methodology & Statistics:

  1. Data Science: Programming with Python (Course code S17, 7-11 July 2025)
  2. Data Science: Network Science (Course code S37, 7-11 July 2025)
  3. Data Science: Statistical Programming with R (Course code S24, 14-18 July 2025)
  4. Data Science: Applied Text Mining (this course)
  5. Data Science: Machine Learning with Python (Course code S70, 21-25 July 2025)
  6. Data Science: Advanced Techniques for Handling Missing Data (Course code S28, 2026)
  7. Data Science: Data Analysis (Course code S31, 2026)
  8. Data Science: Text Mining with R (Course code S41,2026)