Textcavator
Since December 2025, the name I-Analyzer has been changed to Textcavator.
Textcavator is a tool for exploring corpora (large collections of texts). You can use Textcavator to find relevant documents or visualise broader trends within the corpus. Designed with and for researchers in the humanities and social sciences, Textcavator offers an accessible interface to search a wide variety of corpora, such as newspaper archives, online book reviews, and orations. Additional corpora are available for members of Utrecht University.
Many academic disciplines, particularly the Humanities and Social Sciences, have embraced digital technologies to process large amounts of text data. Text datasets can be used to quantify trends observed in close reading, or, conversely, to pinpoint sources which might be interesting for closer, manual analysis.
While many specialised software packages exist for tasks like syntactic analysis, topic modelling, or collocation highlighting, the CDH Research Software Lab (RSLab) (in collaboration with Utrecht University Library) identified a gap in tools for searching and filtering digitised text corpora, such as newspaper archives, prior to further analysis steps. Existing software, such as Delpher, often focusses on specific datasets, limiting their universal applicability.
Textcavator has been developed to bridge this gap. It allows searching and exploring text corpora, visualising trends, and downloading tables of text and metadata for further analysis. Textcavator is open-source software and freely available.
Adding your own corpus to Textcavator
If you are interested in having your own corpus added to Textcavator, please contact us at cdh@uu.nl to explore the possibilities.
Join the pilot for new upload feature
The development team is currently working on a new upload feature for Textcavator. For the final development phase, they are looking for researchers who would like to test this functionality.
Do you have a dataset you would like to explore in Textcavator? Sign up for the pilot and help the RSLab further improve the tool.
Available corpora in Textcavator
- U-Blad (Utrecht University newspaper) print editions, 1969-2010
- Dutch newspaper collection, Royal Library, 1600-1876
- The Dutch Throne Speech, 1814-2023
- Hebrew epigraph collection, 769-849
- Goodreads reviews of translated literary texts, 2007-2022
- Judicial system Netherlands (court rulings), 1900-2022
- Digital Library for Dutch Literature (DBNL), 1200-1890
- Dutch parliamentary debates (Eerste Kamer & Tweede Kamer), 1815-2022
Also available after login with UU employee or student account (Solis-id):
- 19th Century US Newspapers 1800-1900
- Dutch Annual Reports of (non)financial institutes, 1957-2008
- ECCO (Eighteenth Century Collections Online)
- Illustrated London News, 1842-2003
- International Herald Tribune, 1887-2013
- Le Figaro, 1854-1954
- The Economist, 1843-2021
- The Times, newspaper archives 1785-2010
- The Guardian-Observer, archive 1791-2003
- Periodicals, archive 19th century
- Punch, 1841-1992
Current projects
- Update of the Delpher newspaper corpus
- Improvement of the content and functionality of Textcavator, in collaboration with the Utrecht University Library (UBU)
The University Library has many different text corpora, archives and its own digitized material in-house. This material can be made findable, accessible, searchable, interoperable and reusable (FAIR) via Textcavator. This project aims to increase the digital accessibility of the UBU collection and also to enable modern forms of data-driven research with this material. We will do this by improving the delivery of UBU corpora, improving the functionality of Textcavator and adding more and diverse material to Textcavator. Furthermore, we want to invest in the accessibility of the material and the digital literacy of students and researchers by increasing the visibility, accessibility and user-friendliness of Textcavator.
This project has four sub-goals:
- Improving the pipeline for adding (UB) material to Textcavator
- Adding new corpora to Textcavator
- Improving the visibility of Textcavator
- Expansion of Textcavator functionalities