Categories
Pages
-

IT Center Blog

Archive Migration – Project Completion

June 23rd, 2023 | by
Symbol image for project completion archive migration

Source: Freepik

“It is done.” These are the words we can now say about the “Archive Migration” project. After more than two and a half years, the last data from the legacy system was migrated to the digital archive or Coscine last week. Now, with the completion of the project, we can review the time and evaluate the progress.

The Beginnings

The first concept about the upcoming migration was presented in December 2020 and discussed within the IT Center. It quickly became clear that this project would require more than one department to handle the tasks at hand. Based on the original focus on the migration of research data, the project was placed in the “Research Data Management” group of the “IT-PFL” department (now PDSL). The technical implementation was carried out by the department “Systems & Operations”, the public relations work and the communication with the users by the department “Service & Communication”. The migration project was presented to various internal and external committees in February 2021, marking the start of the project. Divided into five subprojects, the various work packages could thus begin: Subproject 1 dealt with the concrete, technical migration of the data from the legacy system. Subproject 2 was responsible for the interface to the digital archive (which did not yet exist at this early stage). Subproject 3 created the form that users had to use to classify their archive nodes. Subproject 4 was responsible for the connection to Coscine and subproject 5 was responsible for the direct and indirect communication with the users.

Milestones and Difficulties Reached Along the Way

With the conversion of the TSM archive to read-only access at the beginning of December 2021, an important milestone was reached: From now on, users could classify their archive nodes, which specifically meant that important metadata had to be stored to be able to determine the future storage location. Thus, research data were migrated to Coscine and data from courses and other data were migrated to the digital archive. In parallel, the interfaces to the target systems were tested with the preceding migration of “simpleArchive”. Already at this point, it became apparent that the targeted schedule would be difficult to meet.

When we started the actual migration of data in the summer of 2022, we never imagined that we would be in a constant process of developing and adapting the scripts used for the migration to merely transfer data from one system to another until the very end. To communicate the ongoing issues and challenges of the project to the waiting archive node owners, we decided to communicate the circumstances quite openly and transparently in a blog post in September 2022. We also gave users the opportunity to learn directly about the status of the project and their archive node with the publicly accessible reporting page. Internally, we expanded the project group and mobilized various resources at different levels: With more employees, we were able to perform a manual migration of the data for problematic nodes, while at the same time working intensively on the further development of the script for the automated migration. Similarly, the virtual machines were increased to over two dozen to migrate many archive nodes in parallel. This way of working was maintained until the end. Finally, when TSM backup was switched to “read only” in January 2023, the limited number of tape drives could be used exclusively for migration, which significantly accelerated progress.

The Groups of People

During the project, several groups of people were created and became directly involved in the project. In addition to the actual project team, there was the “stakeholder group”, consisting of people from different institutions and with different expertise, who reviewed our processes with their view from the outside and provided important feedback. Throughout the entire duration, the “project advisory board” formed an important body, which had the task of informing the project management in regular meetings and participated in decisions on necessary process adjustments.

In addition to the process described, the numbers also reflect the size of this mammoth project: the old archive stored data with a size of over 1.7 PB, distributed over more than 1000 nodes. Of these, half of the nodes were not migrated (about 260 TB) because the users did not want to migrate, or the necessary metadata was not provided. The remaining data is distributed as follows:

Numbers of the archive migration project

Source: Own illustration

In total, more than 600,000,000 objects at almost 1.5 PB were migrated from the TSM archive to Coscine or to the digital archive.

A Heartfelt Thank You

This project could not have been accomplished without the active support of the many project participants. Our heartfelt thanks go first and foremost to the members of the project group, who maintained the necessary stamina to migrate even the last bits and bytes into the target systems. Likewise, a big thank you goes to the project advisory board for their very helpful feedback and always constructive discussions. We would especially like to thank the members of the “Stakeholder Group”, who gave us the users’ perspective on the project. Finally, we would like to thank the archive users themselves for their patience and understanding that  as is so often the case not everything always goes smoothly with projects of this size. We have learned a lot for the next migration.

_________________________________

 

Responsible for the content of this article are Lukas C. Bossert and Sascha Bücken

 

Comments are closed.