Textcavator | Centre for Digital Humanities

Since December 2025, the name I-Analyzer has been changed to Textcavator.

Textcavator is a tool for exploring corpora (large collections of texts). You can use Textcavator to find relevant documents or visualise broader trends within the corpus. Designed with and for researchers in the humanities and social sciences, Textcavator offers an accessible interface to search a wide variety of corpora, such as newspaper archives, online book reviews, and orations. Additional corpora are available for members of Utrecht University.

Many academic disciplines, particularly the Humanities and Social Sciences, have embraced digital technologies to process large amounts of text data. Text datasets can be used to quantify trends observed in close reading, or, conversely, to pinpoint sources which might be interesting for closer, manual analysis.

While many specialised software packages exist for tasks like syntactic analysis, topic modelling, or collocation highlighting, the CDH Research Software Lab (RSLab) (in collaboration with Utrecht University Library) identified a gap in tools for searching and filtering digitised text corpora, such as newspaper archives, prior to further analysis steps. Existing software, such as Delpher, often focusses on specific datasets, limiting their universal applicability.

Textcavator has been developed to bridge this gap. It allows searching and exploring text corpora, visualising trends, and downloading tables of text and metadata for further analysis. Textcavator is open-source software and freely available.

Adding your own corpus to Textcavator

If you are interested in having your own corpus added to Textcavator, please contact us at cdh@uu.nl to explore the possibilities.

Join the pilot for new upload feature

The development team is currently working on a new upload feature for Textcavator. For the final development phase, they are looking for researchers who would like to test this functionality.

Do you have a dataset you would like to explore in Textcavator? Sign up for the pilot and help the RSLab further improve the tool.

Available corpora in Textcavator

Also available after login with UU employee or student account (Solis-id):

Current projects

Update of the Delpher newspaper corpus
Improvement of the content and functionality of Textcavator, in collaboration with the Utrecht University Library (UBU)

The University Library has many different text corpora, archives and its own digitized material in-house. This material can be made findable, accessible, searchable, interoperable and reusable (FAIR) via Textcavator. This project aims to increase the digital accessibility of the UBU collection and also to enable modern forms of data-driven research with this material. We will do this by improving the delivery of UBU corpora, improving the functionality of Textcavator and adding more and diverse material to Textcavator. Furthermore, we want to invest in the accessibility of the material and the digital literacy of students and researchers by increasing the visibility, accessibility and user-friendliness of Textcavator.

This project has four sub-goals:

Improving the pipeline for adding (UB) material to Textcavator
Adding new corpora to Textcavator
Improving the visibility of Textcavator
Expansion of Textcavator functionalities

Interview with Luka van der Plas on the renewed Textcavator

FAIR Research IT interview with Berit Janssen

Adding your own corpus to Textcavator

Join the pilot for new upload feature

Available corpora in Textcavator

Current projects

Read more