4. Archiving data after your research
The topics of storage of and access to data sets are of great importance in data management. With regards to storage, it is useful to distinguish between the ‘dynamic’ storage of data during research and the archival storage after finalization of a (sub) project.
In Step 3. Data storage during your research, we’ll offer support for storage during research. This step is about data storage ‘after your research’.
Once you make scientific claims, e.g. in a publication, you should be able to show how you’ve come to your conclusions. You do so by being transparent about your sources and methods.
The transparency of your sources can be attained by offering access to the data you’ve used, e.g. to create a graph or table with aggregated data in your publication.
Transparency about your methods can, for some part, be attained by offering access to the scripts and algorithms you’ve used for data transformation.
For this you need to ‘archive’ and ‘publish’ your data in the state it was when you created your graph or table. In that case that specific data package is in the ‘after research’ phase.
Note that this does not imply that your research project is finalized. Note as well that most funders require you to create a final deposit of all your data once the research project is at the end of its entire life cycle.
Storage after research
After research you need to store your relevant data sets as a coherent and FAIR package for a longer period (default 10 years), with a set of metadata and a user license, somewhere where this package can be retrieved and preferably using sustainable file formats. This normally happens in a Repository. The UU offers both Yoda and DataVerse. There is also the service of DANS (offered by the KNAW), which offers its own repository DANS Easy.
The preferred storage solution for the Faculty of Humanities is Yoda. If, for some reason, you like to use a different repository, you can check the repository finder of Research Data Management Support.
What data to archive?
You must archive all data needed for research replication and the data for warranting the rights of your research participants.
With regards to personal data it is policy to archive of the faculty of Humanities to archive these for the duration of RAW data, digitized consent forms of data participants, key files (the files linking codes to personal data).
However, not all archived data packages need to be publicly accessible. This means in practice that you need to think through how you assemble your data packages and have a publication strategy, e.g. by publishing different versions of data package, some as Open Access and others as Closed – or Restricted Access data packages. In the Yoda metadata you can refer to these different versions (preferably by using the DOI’s).
The template for listing data packages in Step 1. Establishing your sources can support you thinking through your publication strategy.
FAIR Data
Having FAIR data means that your data package is Findable, Accessible, Interoperable and Reusable Data.
Findability is warranted by demanding each archived data package has a DOI attached to it.
With regards to Accessibility, this depends on the nature of the data. Data packages containing personal data will in principle not be publicly accessible, but only for the researcher and/or the research group, depending on what has been agreed upon in the Informed Consent agreements.
Interoperability and Reusability (and Reproducibility) is supported by the documentation like metadata, code books, readme files and lab journal and reusability by the demand of having all data offered as Open Access (when no personal data is involved) and by using the CC BY 4.0 SA license.
Adding metadata
Descriptions of your data are called metadata. The term is a catch-all for all types of descriptions.
If you don’t describe your data properly, it will very soon become useless, both for yourself as well as for others.
All repositories ask for a minimal set of metadata and offer a form for filling these out. These metadata in general have the purpose of describing the data package stored.
When you store a data package in Yoda for example, you are required to add at least the following elements:
- A title of the data package.
- A short description of the content of the data package.
- The name of the creator of the data package
- A retention period, normally a minimum of 10 years, starting after the finalization of the (sub) project.
These metadata are for administrative use and do not help your fellow researcher to understand the data sets in the data package. Therefore it is practice to add interview protocols, readme-files, codebooks, lab notes etc. to your data package in order to clarify the contents of specific files, collection and sampling methods, etc.
Often you describe these issues also in your scholarly publications. If the publisher has no IP right on the article you might consider to add it to the data package, or cut and paste the methods-section in a readme.txt.
DOIs and Data Catalogues
All archived data packages should be referable with a Digital Object Identifier (DOI).
A DOI is a so called persistent identifier which always redirects to a webpage, also known as a landing page. That means that when you make a URL out of it by putting ‘https://doi.org/’ in front of the DOI you can navigate to that page. If the data package is deleted, e.g. because the retention period is over, the repository managers will leave the landing page, with the original description, but indicate that the object is no longer available and for what reasons that is.
All repositories mentioned will provide you with a DOI. This implies they will share your metadata with DataCite, which is the organization which hands out the DOI’s in the Netherlands. A number of environments like Yoda and DataVerse share the metadata of your data packages also with the data catalogue DANS NARCIS.