On March 8, 2023, it was time once again: the data stewards, RDM officers and RDM interested parties at RWTH Aachen University met for the regular open meeting of the RDM network online via Zoom. The subject of the meeting was the TRUST principles and how they can improve the reuse of research data.
RDM Speed Dating to get to know each other
The virtual meeting opened with the now established RDM Speed Dating. During short breakout sessions, participants exchanged ideas about data reuse and their best data sets in small groups.
What Comes after FAIR?
Enhancing Data Reuse Through the TRUST Principles
Soo-Yon Kim works as a Data Steward in the Cluster of Excellence Internet of Production and reported from her perspective how the TRUST principles can improve the reuse of research data beyond the FAIR principles.
The FAIR principles serve as a guideline for researchers to prepare and document their data for optimal usability by humans and machines. These should, if successfully applied, theoretically enable interested parties to find, obtain and easily use data. And although the reuse of research data promises many advantages (cost reduction, larger database, new collaborations, etc.) for researchers, there is a gap between possible and actual reuse.
In order to overcome this gap and move from the theoretical possibility to the actual reuse of data, other factors play a role that go beyond the FAIR principles and are addressed by the TRUST principles.
TRUST Principles:
- Transparency: Scope of services, guidelines and target group of users are communicated in a transparent and understandable way.
- Responsibility: Commitment to use common standardised formats, if available; verification of authenticity of data sets
- User Focus: Consideration of the discipline-specific and individual needs of the data users.
- Sustainability: the dataset is managed appropriately for the long term
- Technology: secure and durable infrastructure
Thereby, the TRUST principles address the RDM infrastructure in particular and can support the (further) development of useful services. Are there, for example, services for interested parties that can help selecting the available datasets and evaluating their relevance for their own purposes? How can these services look like (possibly recommendation systems for similar datasets, peer reviews, …)?
The TRUST principles give an idea of what data stewards, research data managers and repositoriy operators can do to support data reuse beyond FAIRification, the process of making data FAIR. Thus, they can be seen as a kind of inspiration.
Interactive at the Miro Board
The last part of the network meeting was interactive. On a prepared Miro board, participants were able to answer various questions along the four stages of data re-use based on their own experiences with data re-use.
-
Initiation
The first questions “Have you ever searched for data?” and “Are you interested in reusing data and in what area?” showed that many reuse research data internally within a project or the institute. The most important prerequisites for re-use are, for example, the completeness, comprehensibility and provenance of the data.
-
Discovering and Acquiring data
In the second part, the participants were able to talk about the tools and services they use most. It turned out that most people search for data on GitHub and Zenodo. Other solutions such as FAIRSharing, Google Dataset Search and, interestingly, ChatGPT were also mentioned. When asked “How do you discover data? What tools and services do you actually use?”, linking data to text publication and open databases such as the academia.edu platform were highlighted alongside those already mentioned.
-
Understanding and Selecting
When selecting a dataset, it became apparent that, on the one hand, aspects such as formats and open accessibility are important to the participants. On the other hand, aspects such as a trustworthy repository, multiple use of the dataset in publications, and the author’s reputation and trustworthiness were considered important. Familiar programming languages such as Python and the tools Wikidata and DBpedia are important for understanding a dataset.
-
Process and Reuse
The following steps are typical for the actual re-use:
- Checking: Is the dataset readable, complete and are all variables and values present?
- Processing: Format conversions, filtering options, cleaning up the data set.
Fortunately, some data sets have already been re-used. Data sets from publications and GitHub repositories were the preferred choice. Overall, the meeting showed that the focus of the TRUST principles was on “responsibility” and “user focus”.
The Next Open Meeting of the RDM Network – Save the Date
Date: April 12, 2023
Time: 10am to 12 noon
Location: Conference Room UB (Room 509)
Topic: networking@rwth BarCamp
Learn More
If you also want to become a part of the RDM network at RWTH, then subscribe to the mailing list “DataStewards@RWTH”.
If you have any questions about the RDM network or RDM in general, just write a message to the IT-ServiceDesk. The RDM Team looks forward to hearing from you.
______
Responsible for the content of this article are Daniela Hausen, Sophia Nosthoff and Ute Trautwein-Bruns.
Leave a Reply
You must be logged in to post a comment.