Centre for Digital Humanities

Portfolio

GrETEL 5

Key words

  • Linguistic search engine
  • Dutch treebanks

Short description

GrETEL 5 is a user-friendly, web-based application designed for searching through Dutch syntactically annotated corpora, or so-called treebanks. GrETEL, an acronym for Greedy Extraction of Trees for Empirical Linguistics, facilitates efficient exploration of linguistic data.

Background

The original GrETEL application was developed by the University of Leuven from 2012. Since 2017, the CDH Research Software Lab has been working on a new version – GrETEL 5 – adding more features and reworking it. The following has been done:

  • Expansion of available corpora;
  • Introduction of a new user-friendly interface facilitating to data and metadata uploads, including cleaning and conversions from CHAT, FoLiA and TEI formatted corpora;
  • Implementation of a query validation system with autocomplete functionality;
  • Incorporation of new features for analyzing treebank queries results in terms of data, metadata and combinations of those;
  • Enhanced search capabilities, offering more options for example-based queries;
  • Adoption of a new Django-based API and scheduling using Celery;
  • Integration of Multi Word Expressions search capability;
  • Integration with the infrastructure at the Instituut voor de Nederlandse taal (contributed by Koen Mertens from Instituut voor de Nederlandse Taal)

GrETEL 5 builds on the work at the lab of Martijn van der Klis, Gerson Foks, Tijmen Baarda, Ben Bonfil and Sheean Spoel, and is directed by the applicant prof. dr. Jan Odijk.

Back