CDH workshop: Using AI to analyse historical texts
Recent years have seen great advances in the development of models that allow for the study of language in exciting ways. Yet, despite the availability of these tools and the enormous potential of their application to historical and other humanities data, their utilisation for research in the humanities has been challenging.
This workshop offers a gentle introduction to the use of contextual language models for humanities research. We invite all scholars and graduate students to use our infrastructure for text analysis on their own research questions and/or data.
Members of the Semantics of Sustainability project, namely dr. Pim Huijnen (Assistant Professor Digital Cultural History at Utrecht University & CDH affiliate) and Mees van Stiphout (CDH Research Software Lab), will host the workshop. The workshop will also be supported by members of the National Library and the eScience Center.
Description and objectives
The workshop will introduce researchers to the infrastructure developed in the Open eScience project Semantics of Sustainability, which allows users to do advanced text analysis on Dutch Parliamentary Data (1813-2022) and data provided by the National Library (KB) in The Hague: newspapers, books, magazines, and the ANP (20th Century). The infrastructure will be accessible in a Jupyter Notebooks format, a simple environment through which participants will be able to explore and analyse these Dutch historical textual data.
The aim of the workshop is to teach participants how to work on their own research questions during the workshop, while also creating a space for sharing, testing and discussing the use of AI tools for humanities research. Besides working with the named datasets, participants may use the provided infrastructure to work with data of their own (see sign-up and submissions section for requirements).
Our infrastructure offers functionalities such as:
- Generating personalised corpora from the original project data through a list of seed words.
- Generating search functionality based on (a) specific keyword(s), and semantically related passages.
- Visualising these passages as semantic clusters, providing a sense of the various uses of the keyword(s).
- Adding chronological information to the clusters, so that the change in content and frequency of the clusters can be studied over time.
- Adding other metadata filters, so that the (change in) clusters can be studied in comparison between different political parties, newspaper titles, etc.
- Visualising word frequency over time of one or more keywords.
Depending on the provided data and keywords, these functionalities could allow for a variety of research questions, such as:
- How does parliamentary discourse showcase the conceptual origins of Sustainability in the Netherlands before its formal definition in 1987?
- How did Dutch newspapers from different ideologies frame women’s rights during the 1960s and 1970s?
- How did Dutch political parties react in parliament to the Maastricht treaty of 1992 and its implications?
Practical information
This one-day workshop will take place on Thursday, 20 March 2025 at the National Library (Prins Willem-Alexanderhof 5, 2595 BE, The Hague). Route information can be found on the National Library website. More details will be provided to confirmed participants via email. The workshop is free of charge. Lunch and beverages will be provided.
Level
We highly recommend prior (basic) skills and experience with Python and Jupyter Notebooks, though an introduction to both will be provided for those who have never worked with Python or Jupyter before the workshop.
The workshop will be in English and the data we work with is in Dutch.
For whom?
This workshop is aimed at scholars and graduate students interested in using cutting-edge language models for humanities research. However, it is also open to any interested prospective participant.
Registration
Submissions are now open, though there is a limited number of spots available. Interested participants should sign up as soon as possible by sending an email to Pim Huijnen at p.huijnen@uu.nl. The email should contain:
- Name,
- Contact email,
- Organisation/Affiliation,
- Background/Current Position,
- Level of experience with programming software or using language models,
- Abstract of research idea (200-300 words),
- A list of keywords related to the research question from which the personalised corpora will be generated.
In addition, for those participants who wish to submit their own data to use during the workshop, the email will also need to include a description of the dataset. All prospective datasets will need to conform to the following requirements:
- The data must be contained in files in CSV format.
- Relevant metadata must be present and included.
Further details will be provided after receiving your registration.
Organisation
This workshop is organised in tandem with a conference on AI and Dutch historical texts that will take place at the National Library at The Hague on 21 March 2025. Both of these events are organised by the Semantics of Sustainability project in collaboration with the National Library at The Hague, the Netherlands eScience Center, and the Centre for Digital Humanities at Utrecht University.