Centre for Digital Humanities

Events

Data Science: Introduction to Text Mining with R (Utrecht Summer School)

Event details

Date:
11 July 2022 - 14 July 2022
Time:
All Day
Location:
Utrecht, exact location to be announced

Applications of text mining are everywhere: social media, web search, advertising, emails, customer service, healthcare, marketing, etc. In this course, students will learn how to apply text mining methods on text data and analyse them in a pipeline with statistical learning algorithms. The course has a strongly practical hands-on focus, and students will gain experience in using and interpreting text mining on data examples from humanities, social sciences, and healthcare.

Nowadays, from social sciences to humanities and healthcare, a major portion of data is inside text. However, text is considered as a kind of unstructured information, which is difficult to process automatically. Therefore, text mining can be applied to create a more structured representation of a text, making its content more accessible to researchers. Therefore, this course offers an elaborate introduction into text mining with R. The course has a strongly practical hands-on focus, and students will gain experience in using text mining on real data from for example social sciences and healthcare domains and interpreting the results. Through lectures and practicals, the students will learn the necessary skills to design, implement, and understand their own text mining pipeline. The topics in this course include regular expressions, text preprocessing, text classification and clustering, and word embedding approaches for text data.

The course deals with:

  • Understand and explain the fundamental approaches to text mining
  • Understand and apply current methods for analyzing texts
  • Understand how text is handled, manipulated, preprocessed and cleaned
  • Define a text mining pipeline given a practical data science problem
  • Implement generic text mining tools such as regular expression, text clustering, text classification, sentiment analysis, and word embedding

The course starts at a very basic level and builds up gradually. At the end of the course, participants will master text mining skills with R. Participants should have a basic knowledge of data science and scripting in R.
Application deadline: 27 June 2022

This course is part of a series of 5 courses in the Summer School Data Science specialisation taught by UU’s department of Methodology & Statistics.

More info & registration