Centre for Digital Humanities

Events

Data Science: Text Mining with R (Utrecht Summer School)

Event details

Date:
19 August 2024 - 22 August 2024
Time:
All Day
Venue:
Victor J. Koningsberger building
Budapestlaan 4a-b, Utrecht, 3584 CD

From Monday 19 August to Thursday 22 August, Utrecht University’s Faculty of Social and Behavioural Sciences offers an Utrecht Summer School course on text mining with R.

This course has a strongly practical hands-on focus, and students will gain experience in using and interpreting text mining on data examples from humanities, social sciences, and healthcare.

Nowadays, a major portion of data is inside text. However, text is considered as a kind of unstructured information, which is difficult to process automatically. Therefore, text mining can be applied to create a more structured representation of a text, making its content more accessible to researchers.

This course offers an elaborate introduction into text mining with R. Students will gain experience in using text mining on real data and in interpreting the results. Through lectures and practicals, the students will learn the necessary skills to design, implement, and understand their own text mining pipeline. The topics in this course include regular expressions, text preprocessing, text classification and clustering, and word embedding approaches for text data.

The course deals with:

  • Understanding and explaining the fundamental approaches to text mining
  • Understanding and applying current methods for analyzing texts
  • Understanding how text is handled, manipulated, preprocessed and cleaned
  • Defining a text mining pipeline given a practical data science problem
  • Implementing generic text mining tools such as regular expression, text clustering, text classification, sentiment analysis, and word embedding

The course starts at a very basic level and builds up gradually. At the end of the course, participants will have mastered text mining skills with R.

Participants should have a basic knowledge of data science and scripting in R.

Deadline for registration is 5 August. More information can be found here.