Research Data – Latest News & Worth Knowing

Data Champions at the RWTH: Carolin Victoria Schneider

April 11th, 2024 | by
Photo by Data Champion Carolin Victoria Schneider

Source: North Rhine-Westphalian Academy of Sciences, Humanities and the Arts | Bettina Engel-Albustin 2022

This article is part of our series “Data Champions at RWTH”. Data Champions are researchers or employees who have excelled in the field of research data management (RDM) and/or have experience that is groundbreaking for colleagues or can serve as a guide.

Our first Data Champion is Prof. Dr. med. Carolin Victoria Schneider. She works at RWTH University Hospital and was named Young Scientist of the Year 2023 by academics. We spoke to Ms. Schneider about her connection to and interest in Research Data Management.


Ms. Schneider, please Introduce Yourself Briefly

My name is Carolin Schneider, born on June 14, 1995 in Neuss. My school career began at the Erzbischöfliches Gymnasium Marienberg in Neuss, where I completed my Abitur in June 2013. I then decided to study medicine at RWTH Aachen University, which I completed from September 2013 to December 2019. During my studies, I received the Peter Scriba doctoral scholarship, which supported my doctorate in medicine. I completed this from February 2016 to October 2020 at RWTH Aachen University. Following my studies, I moved to the USA, where I completed a postdoctoral research stay at the University of Pennsylvania from December 2019 to June 2022. This time was funded by the Walter Benjamin Fellowship of the German Research Foundation. This experience not only expanded my academic knowledge, but also my intercultural skills.

Since July 1, 2022, I have been working as a research group leader and physician at RWTH Aachen University. This position allows me to work in both clinical practice and research, which is very important to me. On September 1, 2023, I took up the W1 professorship for Prevention and Genetics of Metabolic Liver Diseases at RWTH Aachen University. In addition, since December 1, 2023, I have been an Associate Professor of Translational Medicine and Human Genetics at the University of Pennsylvania in Philadelphia, USA. This transatlantic role enables me to build bridges between research and clinical application in two high-caliber academic environments and thus contribute to the advancement of medical science.

Which SFB Do You Work In? What Exactly Is it?

I am leading a project in the CRC 1382 with a focus on the gut-liver axis. My specific research project in this CRC focuses on the complex interaction between the gut and liver, which is strongly influenced by the gut microbiome. This balance can be disturbed by external and hereditary influences and lead to disease. Diet in particular has been identified as an important regulator of the composition of the microbiome. However, the mechanistic links between diet, liver health and the microbiome are still poorly understood. We will use large multi-omics datasets to investigate the relationship between dietary changes and the development of liver diseases that we hypothesize are triggered by changes in the gut microbiome. These comprehensive computational studies will provide important information on the relationship between nutrients, microbiome, liver disease and metabolism and provide data for one or more therapeutic approaches to utilize the gut-liver axis as a therapeutic tool.

What Is Your Connection to RDM? When Did You First Come Into Contact With it?

I became interested in RDM early on in my scientific career, especially when I began to focus more intensively on the use of data science methods and AI in medical research. During my PhD at RWTH Aachen University, I first came into direct contact with it when I started collecting and analyzing large amounts of data for my research work. This experience deepened during my postdoctoral stay at the University of Pennsylvania, where I had access to even larger and more diverse data sets.

The need to efficiently manage large and complex data sets quickly became a central aspect of my work. This included the careful collection, preparation and analysis of data from a variety of sources. I realized the importance of a structured and systematic approach to this data to ensure the integrity of the research and to achieve reproducible and valid results. The interdisciplinary collaboration emphasized the importance of coherent research data management for efficient communication and collaboration between disciplines. An essential part of my commitment to research data management is also the consideration of ethical considerations and data protection regulations. The protection of patients’ privacy and the security of their data are of the utmost importance to me. This requires constant engagement with ethical guidelines and legal requirements to ensure that all data analysis is conducted responsibly and to the highest standards.

Why Is It Important?

Research data management is of crucial importance for several reasons: First of all, a structured and efficient RDM ensures the quality and integrity of research results. By systematically collecting, storing and analyzing data, errors can be reduced and the reproducibility of study results improved. This is particularly important in an area where research results can have a direct impact on clinical decisions and patient treatment. In addition, well-organized research data management enables the efficient use of resources. Medical research often generates large amounts of data, which can be time-consuming and costly to collect and analyze. Effective data management helps to avoid duplication of work, promotes the reuse of data and thus supports more cost-effective research. Another important aspect is the promotion of collaboration and knowledge sharing within the scientific community (“open science”). By adhering to standards in research data management, data can be exchanged and made accessible more easily, which facilitates collaboration between researchers from different disciplines. This is particularly important in interdisciplinary research, which is necessary for understanding complex medical conditions and developing new therapies. Finally, the protection of privacy and data security play a central role. Sensitive personal data is often collected in medical research. Responsible research data management ensures that this data is stored and processed securely and only used in compliance with ethical guidelines and legal regulations. This not only protects the privacy of patients and study participants, but also maintains the public’s trust in medical research.

How Do You Organize Your Data?

The organization of my research data plays a central role in my scientific work and follows a methodically thought-out approach that extends from planning and storage to analysis and finally the possible transfer of the data.  Once the data has been collected or acquired, it is structured and stored in systems that guarantee secure and long-term storage. Database systems are used that not only enable efficient storage and retrieval, but also ensure that the data is protected against unauthorized access.  The data analysis is preceded by careful data cleansing. For the analysis itself, I use a variety of data science methods that can vary depending on the research question and data characteristics. These methods range from statistical analyses to complex machine learning procedures.

What Infrastructure (elements) Do You Use?

The SFB provides us with very useful tools that facilitate collaboration and data analysis within the SFB. One of these tools is SharePoint, which we use to store and share certain data sets. SharePoint enables our team to work together efficiently and exchange documents and information securely. For computationally intensive tasks and processing large amounts of data, we use the RWTH Aachen University cluster. This resource is particularly valuable for performing data science analyses and simulations, as it provides us with the necessary computing power to perform complex calculations efficiently when this is compatible with the data security of the data set. Our code is made available via GitHub. To organize and manage our research data, we use the RDM platform Coscine wherever possible. Coscine provides a comprehensive research data management solution that allows us to systematically catalog, store and make data available for collaboration within our team, within the CRC and with other partners.

However, a central element of our research data management is the data management plan, which contains a detailed description of the measures for data storage, backup, transfer and archiving. This plan ensures that all aspects of data management are systematically addressed and that compliance with ethical and data protection regulations is guaranteed.

What Do You Need to Consider When Handling the Data?

To meet these challenges of different external and internal data sets, we rely on a combination of internal and external resources as well as specialized tools to ensure both the security and efficiency of our research processes. For example, the use of the Lifelines cohort in our project places particular demands on data management, especially the storage and processing of data. The data from the Lifelines cohort is stored on the cohort’s servers and cannot be downloaded. This situation also forces us to adapt our research processes. We have to plan and conduct our analyses carefully, as access to the data is also restricted in terms of time. This requires careful planning of our research activities and possibly the development of new data analysis strategies specifically tailored to the infrastructure of the Lifelines cohort.

Are you Supported in Data Management or Do You Have Contact or Experience With Data Stewards?

In my work, I benefit greatly from working with the data stewards. They help me to develop effective data management strategies that meet the requirements of my specific research projects. Among other things, the data stewards support me in structuring and organizing my data, in selecting suitable tools and platforms for data storage and analysis, and in ensuring the traceability, accessibility and interoperability of all research data.

What Do You Wish for the Future? What Advice Can You Give Others?

My wish for the future is that awareness of the importance of structured and proactive research data management continues to grow in the scientific community. It is important that researchers consider data management as an integral part of research planning from the very beginning of their projects. A well-thought-out data management plan is not only a prerequisite for compliance with funding guidelines and legal requirements, but also forms the basis for high-quality, reproducible and reusable research results.

My key advice to other researchers, especially those at the beginning of their scientific career, is to draw up a data management plan early on! Such a plan not only helps to keep track of the collected data and ensure its quality, but also facilitates collaboration within the team and with external partners.

Ms. Schneider, thank you very much for the interesting interview! We wish you all the best and every success in your future endeavors.

Responsible for the content of this article are Katharina Grünwald and Carolin Victoria Schneider.

Leave a Reply