Researchers face the daily challenge of efficiently managing and accessing large amounts of data. Nowadays, data management goes beyond simple storage. According to the FAIR principles, research data should be easy to find (Findable), accessible (Accessible), interoperable (Interoperable) and reusable (Reusable). This is exactly where Coscine (Collaborative Scientific Integration Environment) comes in – a platform that supports researchers throughout the data lifecycle.
But what exactly does that mean? Let’s take a look at the structure of Coscine and explain step by step how the platform for research data management (RDM) works.
The Structure of Coscine
The Coscine graphic shows how the various components are connected and how Coscine is structured.
1. Login and Access
The first step in using Coscine is to log in. Various login methods are possible, currently for example DFN-AAI, ORCiD or RegApp. This means that researchers from any university or research institution can easily access the platform. It is also possible to link the various login methods. After logging in, you are taken to the user interface (UI). This interface is the “start page” from which everything can be controlled. The UI is directly connected to the API (Application Programming Interface), which serves as the central interface.
2. API: The Interface
The API is at the center of the graphic. An API is a kind of bridge that connects different programs and services. The different parts of Coscine communicate with each other via the API.
The API is therefore the linchpin through which all other components of the platform are linked. It ensures that the various elements work together smoothly.
3. Resources
The various resources are an essential part of Coscine. The following two resources are available to all researchers:
- Linked Data: With Linked Data it is possible to manage metadata for files in external systems that are not integrated in Coscine.
- GitLab: With the GitLab resource type, it is possible to manage metadata for GitLab repositories in Coscine.
The following three resources, on the other hand, are available to authorized DH.NRW universities:
- Web: Web resources can be created in any project without a storage space request so that the data can be uploaded via the browser. Authorized users receive 100 GB and can, if necessary, increase the storage space further by submitting an application.
- S3: This resource is particularly suitable for large amounts of data. S3 resources can be used via the S3 protocol with various clients such as WinSCP, Cyberduck or MinIO Client.
- WORM: WORM stands for Write once, read many. Once saved, data can never be deleted, changed or overwritten again. This resource type is therefore only suitable for data that absolutely requires such a high level of protection.
Depending on the resource type, different clients (e.g. S3 clients and Git clients) can be used for direct access.
All relevant data for managing users, projects and resources is stored in the SQL database (Structured Query Language). This forms the foundation on which Coscine is based.
4. Quick Search for Data
An important component of Coscine is the ability to search (meta) data quickly and easily. This is where ElasticSearch comes into play. ElasticSearch is a tool that quickly searches through huge amounts of data and delivers relevant results. Coscine also offers a semantic search. With semantic search, the information is placed in context with each other, enabling a search that also finds linked elements. To enable semantic search, a special document for ElasticSearch is being created via the Semantic Search project. These documents are specially created to enable a semantic search via Coscine.
The graphic shows various ways of interacting with the API. Interaction is possible via clients already developed in C# or TypeScript. Users can also create their own client using the OpenAPI definition. The API can be “explored” via the swagger page generated from the API definition. In addition, another client is implemented in Python (Coscine Python SDK) and offers further features and functions for using Coscine.
5. FAIR Digital Objects (FDO)
Another important point in the graphic is the FAIR Digital Objects (FDO) – i.e. data that is easy to find, accessible, linkable and reusable. Each FDO contains important metadata (information about the data itself) and a persistent identifier (PID), which ensures that the data remains available and findable in the long term.
The metadata of the FDOs is stored in the QuadStore, while the PIDs are stored in an external, specially created service. A QuadStore is a special database that focuses on linked data – e.g. RDF (Research Data Format) structured via SHACL (Shapes Constraint Language). The data can then be integrated into linked data platforms via the QuadStore and searched using SPARQL clients.
6. Metadata Profiles
Coscine offers the AIMS Metadata Profile Generator to ensure that the data is correctly described and organized. This helps to create metadata profiles or to find existing profiles that precisely describe the data and thus facilitate reusability and searching.
Conclusion: Coscine Makes Research Easier
Coscine makes it easier for researchers to handle all aspects of their data – from storage and search to long-term use. The clear structure of the platform and the central API make it easy to access various storage options and find data quickly. Particularly valuable is the integration of FAIR Digital Objects, which ensure that data is not only findable, accessible and reusable today, but also in the future.
Coscine offers a flexible, modern solution for the requirements of research. Whether you need to store a small amount of data or search through huge datasets, Coscine has the right tools to make your work more efficient and sustainable. It’s not just a tool for the moment, but a platform that can also meet future challenges.
Responsible for the content of this article are Laurin Ellenbeck and Arlinda Ujkani.
Leave a Reply