{"id":12785,"date":"2022-09-21T12:00:18","date_gmt":"2022-09-21T10:00:18","guid":{"rendered":"https:\/\/blog.rwth-aachen.de\/itc\/?p=12785"},"modified":"2023-02-10T14:26:35","modified_gmt":"2023-02-10T13:26:35","slug":"projektverzug-archivmigration","status":"publish","type":"post","link":"https:\/\/blog.rwth-aachen.de\/itc\/en\/2022\/09\/21\/projektverzug-archivmigration\/","title":{"rendered":"Project Delay Archive Migration"},"content":{"rendered":"<div class=\"twoclick_social_bookmarks_post_12785 social_share_privacy clearfix 1.6.4 locale-en_US sprite-en_US\"><\/div><div class=\"twoclick-js\"><script type=\"text\/javascript\">\/* <![CDATA[ *\/\njQuery(document).ready(function($){if($('.twoclick_social_bookmarks_post_12785')){$('.twoclick_social_bookmarks_post_12785').socialSharePrivacy({\"txt_help\":\"Wenn Sie diese Felder durch einen Klick aktivieren, werden Informationen an Facebook, Twitter, Flattr, Xing, t3n, LinkedIn, Pinterest oder Google eventuell ins Ausland \\u00fcbertragen und unter Umst\\u00e4nden auch dort gespeichert. N\\u00e4heres erfahren Sie durch einen Klick auf das <em>i<\\\/em>.\",\"settings_perma\":\"Dauerhaft aktivieren und Daten\\u00fcber-tragung zustimmen:\",\"info_link\":\"http:\\\/\\\/www.heise.de\\\/ct\\\/artikel\\\/2-Klicks-fuer-mehr-Datenschutz-1333879.html\",\"uri\":\"https:\\\/\\\/blog.rwth-aachen.de\\\/itc\\\/en\\\/2022\\\/09\\\/21\\\/projektverzug-archivmigration\\\/\",\"post_id\":12785,\"post_title_referrer_track\":\"Project+Delay+Archive+Migration\",\"display_infobox\":\"on\"});}});\n\/* ]]> *\/<\/script><\/div><p><div id=\"attachment_12786\" style=\"width: 310px\" class=\"wp-caption alignright\"><a href=\"https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/coming-soon-hour-glass-g547b23fd4_1920.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-12786\" class=\"wp-image-12786 size-medium\" src=\"https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/coming-soon-hour-glass-g547b23fd4_1920-300x200.png\" alt=\"Coming Soon with hourglass\" width=\"300\" height=\"200\" srcset=\"https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/coming-soon-hour-glass-g547b23fd4_1920-300x200.png 300w, https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/coming-soon-hour-glass-g547b23fd4_1920-1024x683.png 1024w, https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/coming-soon-hour-glass-g547b23fd4_1920-768x512.png 768w, https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/coming-soon-hour-glass-g547b23fd4_1920-1536x1024.png 1536w, https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/coming-soon-hour-glass-g547b23fd4_1920.png 1920w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-12786\" class=\"wp-caption-text\">Source: <a href=\"https:\/\/pixabay.com\/de\/illustrations\/coming-soon-stunden-glas-4721933\/\">Pixabay<\/a><\/p><\/div><\/p>\n<h3><span style=\"color: #00549f;\"><em>***Update***<\/em><\/span><\/h3>\n<p><em>Currently, the migration of the archive data is still ongoing and will last beyond the extended project goal. Users whose data has not yet been migrated will be contacted by email. You can check the current status of the migration on our <a href=\"https:\/\/archivemigration.pages.rwth-aachen.de\/archiveanalysis\/python-report.html\">reporting page<\/a>. (*)<\/em><\/p>\n<p>&nbsp;<\/p>\n<p>The &#8220;Archive Migration&#8221; project, in which we are transferring archived data from the TSM inventory system to the new target systems <a href=\"https:\/\/help.itc.rwth-aachen.de\/en\/service\/44830fa165f14469be64823f6016cd9e\/\">DigitalArchive<\/a> and <a href=\"https:\/\/help.itc.rwth-aachen.de\/en\/service\/7ab6210773b04ef28a1a8cb33628be67\/article\/da644c2defb9492ea2eb82bbae5ea0d6\/\">Coscine<\/a>, is developing more and more into a mammoth project. Despite extensive planning of the five sub-projects and comprehensive communication on the classification of the archived nodes, unforeseeable problems occurred and continue to occur in the technical implementation of the migration, which are now leading to a project delay.<!--more--><\/p>\n<h3><span style=\"color: #00549f;\">Planned project end 31.12.2022<\/span><\/h3>\n<p>Due to the accumulated problems and the constant need to adapt our workflows, the end of the project is expected to be delayed until December 31, 2022. At this point, however, it is very important for us to inform all those affected that the archived data is safe with us. The top priority in archive migration is to preserve the integrity of the data to be migrated.<\/p>\n<p>With this blog post, we would like to provide transparent insight into the challenges and technical issues we have encountered so far in archive migration.<\/p>\n<p>Even though we can only describe an intermediate step today, we would like to thank all node contact persons and backup admins who have taken over the classification of archive nodes (research or course data or other data) on behalf of, among others, departed node contact persons. In this way, they created the basis for being able to migrate the archive nodes to the correct target systems. Nevertheless, in many cases, rework was necessary here as well, so that even now it comes to the collection of metadata of individual nodes.<\/p>\n<p>In the process, we found that nodes were also classified even though they did not hold any data at all. Since it is technically not possible to migrate these because there is no data that can be migrated, these nodes are given the status &#8220;No Migration&#8221;.\u00a0 The respective node contact persons are informed about this by e-mail.<\/p>\n<h3><span style=\"color: #00549f;\">Ongoing new problems with the start of technical migration<\/span><\/h3>\n<p>The original plan to automatically migrate archived data from the TSM inventory system to the new DigitalArchive and Coscine target systems did not work as planned due to a variety of technical issues. This circumstance describes the primary reason for the project delay. Due to the large volume of data to be migrated of over 1,690,722 GB (approx. 1,7 PB) with over 785 million objects, there is inevitably a certain amount of data heterogeneity, so we have continuously run into new problems since the start of the technical migration that we could not have foreseen in this way.<\/p>\n<p>We would like to elaborate on the biggest challenges and problems we have encountered so far.<\/p>\n<ul>\n<li><strong>Platform problems on the existing system:<\/strong><br \/>\nIn the existing system TSM, data could be imported into a Windows or Linux node. Both systems require completely different adaptations and every &#8220;exception&#8221; that we encounter with the archive nodes always has to be corrected for both platforms.<\/li>\n<li><strong>Heterogeneity of the stored data, e.g. encoding of file names:<\/strong><br \/>\nThe existing system TSM does not operate according to the encoding standard UTF-8, so that file names with special characters or similar are not output correctly and we could not migrate them correctly. Since the character encoding system of TSM is not documented, we first had to spend a lot of time decoding to enable a correct migration. For the target systems, we are working with the UTF-8 standard, so in the future this problem will no longer occur.<\/li>\n<li><strong>Encrypted nodes:<\/strong><br \/>\nIn the existingsystem there was the possibility to encrypt nodes via the TSM. Only those in possession of the key could access the data. Even we in the IT Center do not know this key and there is <strong>no<\/strong> way for us to access and migrate this data. We will individually contact the contact persons of the nodes for whose nodes we detect such encryption and inform them about the possibility of &#8220;self-initiated migration&#8221;.<\/li>\n<li><strong>Empty nodes that were nevertheless classified:<\/strong><br \/>\nThankfully, the call for classification of archive nodes was followed. However, nodes were also classified where we have now determined that they <strong>do not<\/strong> contain files that we can migrate. For this reason, we are once again individually contacting the appropriate contact persons of the nodes about this, that we will give their empty nodes a status of &#8220;No Migration&#8221; and no migration will take place. Where there is no data, no data can be migrated. Of course, those affected will have another opportunity to check this status. More details will be communicated in the corresponding e-mail to those affected.<\/li>\n<li><strong>Final notification about migrated nodes:<\/strong><br \/>\nDue to an incorrectly set script, the contact persons of the nodes were informed in June 2022 that their data had supposedly been migrated. The link in the e-mail led to the target system Coscine, but only to the top level of the personal area, without any indication of which data had been migrated. We were able to quickly identify and correct the error, so we were able to contact the appropriate people through various channels. Once again, we apologize for the volume of e-mails and the irritation. To make it easier to see which archive node data has been successfully migrated, we will include the project name in the notification e-mail.<\/li>\n<li><strong>Archive node access error (error code 500):<\/strong><br \/>\nThe migrated research data is migrated to the so-called RDS-NRW share, which has a very high level of protection thanks to geo-redundancy across different locations in NRW. During an update of the firewall firmware of this RDS-NRW share at the end of August 2022, a faulty configuration occurred. This resulted in error code 500 being displayed in Coscine. The system has since been configured correctly and migrated research data can be accessed again.<\/li>\n<li><strong>Very large archive nodes:<\/strong><br \/>\nThe fact that very many and large data come together in a research context is demonstrated by archive nodes, which are our &#8220;large candidates&#8221; with over 90 TB to 190 TB. We cannot migrate these nodes in the course of the envisaged automated workflow, but have to perform the steps of downloading from TSM and uploading to the target system manually. This means that the migration progress has to be also consistently monitored. Since we are dealing with a tape storage system in the existing system TSM, reading out the tapes is a mechanical process that takes a correspondingly long time for such large nodes. [If you want to know how such a tape storage system works, watch the <a href=\"https:\/\/www.youtube.com\/watch?v=CVN93H6EuAU\">video<\/a> concerning this topic.]<\/li>\n<li><strong>Archive nodes with very many objects:<\/strong><br \/>\nIn addition to huge archive nodes, archive nodes with very many individual objects also cause us problems. Almost 100 nodes have more than 10 million objects, which is not only time consuming (reading data from TSM and uploading it again), but also technically challenging.<\/li>\n<\/ul>\n<p>These problems are in themselves manageable in isolation, but they have slowed us down accordingly in the technical migration.<\/p>\n<p>Thanks to the wonderful support and diverse know-how of our colleagues, we have found ways and means to move forward with the archive migration. In addition to ongoing adjustments to scripts and workarounds, staggering of migration phases, and the deployment of more staff, we are now well on our way and confident that we will be able to send more and more completion notifications to the contact persons of the nodes on an ongoing basis.<\/p>\n<h3><span style=\"color: #00549f;\">Viewable reporting available<\/span><\/h3>\n<p>In addition, we are steadily working on processing archive migration requests as well as expanding our archive migration reporting. We ask all users for a little indulgence and patience during processing. The archive will continue to be available on a read-only basis. If necessary, data can be downloaded from the archive. Only the upload is now only possible via the DigitalArchive and Coscine.<\/p>\n<p>Naturally, there is a lot of interest in how far the respective archive nodes are in the migration. For this reason, we have developed a reporting for all interested parties. The <a href=\"https:\/\/archivemigration.pages.rwth-aachen.de\/archiveanalysis\/python-report.html\">reporting page<\/a> is updated hourly and shows the progress of the last hour as well as compared to the previous day. In order to work as efficiently as possible, we have introduced a system of classification and status, using abbreviations such as &#8220;migcResearch&#8221; (successful migration to Coscine, no completion notification sent yet). On the page we explain what is behind it and how to read the reporting.<\/p>\n<div id=\"attachment_12787\" style=\"width: 905px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/Uebersicht-Reporting.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-12787\" class=\"wp-image-12787 size-full\" src=\"https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/Uebersicht-Reporting.png\" alt=\"Ansicht Reporting\" width=\"895\" height=\"564\" srcset=\"https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/Uebersicht-Reporting.png 895w, https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/Uebersicht-Reporting-300x189.png 300w, https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/Uebersicht-Reporting-768x484.png 768w\" sizes=\"auto, (max-width: 895px) 100vw, 895px\" \/><\/a><p id=\"caption-attachment-12787\" class=\"wp-caption-text\">View of Reporting<br \/>Source: Own illustration<\/p><\/div>\n<p>In the &#8220;Individual Node Report&#8221; overview, you can see the status of your archive node based on the ID, which can be read from the URL of the metadata form.<\/p>\n<div id=\"attachment_12788\" style=\"width: 891px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/Uebersicht-Individual-Node-Report.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-12788\" class=\"wp-image-12788 size-full\" src=\"https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/Uebersicht-Individual-Node-Report.png\" alt=\"View of the &quot;Individual Node Report&quot;\" width=\"881\" height=\"308\" srcset=\"https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/Uebersicht-Individual-Node-Report.png 881w, https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/Uebersicht-Individual-Node-Report-300x105.png 300w, https:\/\/blog.rwth-aachen.de\/itc\/files\/2022\/09\/Uebersicht-Individual-Node-Report-768x268.png 768w\" sizes=\"auto, (max-width: 881px) 100vw, 881px\" \/><\/a><p id=\"caption-attachment-12788\" class=\"wp-caption-text\">View of the &#8220;Individual Node Report&#8221;<br \/>Source: Own illustration<\/p><\/div>\n<p>We apologize for the delay to all those who were worried about their archive nodes in the meantime. Unfortunately, it took a little longer this time, but things are looking good and we are doing our best to safely and securely transfer all data to the designated target systems.<\/p>\n<p>&nbsp;<\/p>\n<p>Responsible for the content of this article are <a href=\"https:\/\/www.itc.rwth-aachen.de\/cms\/IT-Center\/IT-Center\/Team\/~epvp\/Mitarbeiter-CAMPUS-\/?gguid=0x741F3A251551044BB9047AF649DED3B4&amp;allou=1&amp;lidx=1\">Lukas C. Bossert<\/a> and <a href=\"https:\/\/www.itc.rwth-aachen.de\/cms\/IT-Center\/IT-Center\/Team\/~epvp\/Mitarbeiter-CAMPUS-\/?gguid=0x076EFD6C62ADCF4D868FB7134A14B07C&amp;allou=1&amp;lidx=1\">Nicole Filla<\/a>.<\/p>\n<h6>(*) The paragraph was updated on February 09, 2023.<\/h6>\n<p><\/p>","protected":false},"excerpt":{"rendered":"<p>Sorry, this entry is only available in Deutsch.<\/p>\n","protected":false},"author":1859,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"c2c_always_allow_admin_comments":false,"footnotes":""},"categories":[315],"tags":[42,459,43,46,163],"class_list":["post-12785","post","type-post","status-publish","format-standard","hentry","category-services-support","tag-archiv","tag-archivknoten","tag-archivmigration","tag-coscine","tag-simplearchive"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/blog.rwth-aachen.de\/itc\/en\/wp-json\/wp\/v2\/posts\/12785","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.rwth-aachen.de\/itc\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.rwth-aachen.de\/itc\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.rwth-aachen.de\/itc\/en\/wp-json\/wp\/v2\/users\/1859"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.rwth-aachen.de\/itc\/en\/wp-json\/wp\/v2\/comments?post=12785"}],"version-history":[{"count":7,"href":"https:\/\/blog.rwth-aachen.de\/itc\/en\/wp-json\/wp\/v2\/posts\/12785\/revisions"}],"predecessor-version":[{"id":13949,"href":"https:\/\/blog.rwth-aachen.de\/itc\/en\/wp-json\/wp\/v2\/posts\/12785\/revisions\/13949"}],"wp:attachment":[{"href":"https:\/\/blog.rwth-aachen.de\/itc\/en\/wp-json\/wp\/v2\/media?parent=12785"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.rwth-aachen.de\/itc\/en\/wp-json\/wp\/v2\/categories?post=12785"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.rwth-aachen.de\/itc\/en\/wp-json\/wp\/v2\/tags?post=12785"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}