Categories
Pages
-

IT Center Changes

GPU resources

September 24th, 2024 | by

In order to unify the resource allocation process and to avoid confusion, we will apply the following changes on October 2nd, 2024:

  • If you want to apply for GPU resources, you have to use “k GPU-h”  (i.e., thousand GPU hours) instead of “Mio core-h”. This includes the detailed project description as well as the JARDS online form. For convenience reasons, the JARDS system will still calculate the core hour equivalent. Until the end of the year we will accept the old metric in your detailed project descriptions, of course.
  • The views in JARDS.project (RWTH projects, NHR projects) will show the used contingents in “k GPU-h” instead of “Mio. core-h” for all GPU resources.
  • The command line tool r_wlm_usage will show the used contingents in “k GPU-h” instead of “Mio. core-h” for all GPU resources (i.e. “ML partition”).
  • If you act as scientific reviewer, you should recommend GPU resources in “k GPU-h” instead of “Mio. core-h”.

There will be no changes for the HPC (i.e. CPU resources) partition. Please note: The internal billing mechanisms will not change at all. Our Slurm configuration will still use a billing which respects the used memory and core equivalents of a node. Basically, the factor 24 between GPU-h and Core-h will be used.

 

You can find all limits for the different project categories on our website.


You can track any disruptions or security advisories that may occur due to the aforementioned change in the Email category on our status reporting portal.

Default Anaconda Repositories have been blocked on the HPC Cluster

September 20th, 2024 | by

Dear CLAIX Users

As you may have noticed, access to the “default” Anaconda repositories (repo.anaconda.com and others hosted at anaconda.com) has been blocked by the firewall on the HPC cluster. This action is necessary because RWTH is not permitted to use the these repositories due to licensing issues.

We understand that blocking the Anaconda domain may disrupt your current workflows. To mitigate this, we set conda-forge as the default channel in /etc/conda/condarc. If you still encounter issues, please check the .condarc file in your home directory and make sure to remove “defaults”, “r”, and “main” from the channel list.

For new Conda users, we suggest using Miniforge, as this distribution uses conda-forge as its default repository.

We apologize for any inconvenience this may cause. If you require assistance, please reach out to servicedesk@itc.rwth-aachen.de.

PS: For a brief time, we accidentally blocked anaconda.org. We have corrected this issue.

HPCWORK Now Offers Increased File Quotas

July 19th, 2024 | by

Dear CLAIX Users

As you might remember, in April, we transitioned the HPCWORK directories to a new Lustre-System. This new system provides significantly greater file quotas. We are pleased to announce that you now have a default quota of 1 million files.

We hope this enhancement makes your workflows easier and more capable.

Happy computing!
Your HPC Admins

[CLAIX-2023] Update to Rocky Linux 8.10

June 20th, 2024 | by

The RWTH Compute Cluster CLAIX-2023 was updated to Rocky Linux 8.10 due to the end-of-life of Rocky 8.9 ensuring the availability of continuous security and bugfix updates to the system.

Detailed changes from Rocky 8.9 to Rocky 8.10 can be read from the Rocky Linux relase notes.

Note: Please keep in mind that due to the update, the measurable performance may vary compared to previous compute jobs due to changed library and application versions included in the updated system.


You can track any disruptions or security advisories that may occur due to the aforementioned change in the Email category on our status reporting portal.

Final decommision CLAIX-2018

June 3rd, 2024 | by

All remaining CLAIX-2018 (login and backend nodes) have been decommissioned now. There are only the following exceptions:

  • login18-4 and copy18-1 stay online for integrative hosting customers. All others should use the new CLAIX-2023 dialog systems.
  • login18-g-1 stays online until login23-g-1 is available again (compare maintenance page).

 


You can track any disruptions or security advisories that may occur due to the aforementioned change in the Email category on our status reporting portal.

Change of default partition to CLAIX-2023

May 15th, 2024 | by

The default partition for all projects was changed from CLAIX-2018 to the corresponding CLAIX-2023 partition (e.g., c23ms for CPU jobs, c23g for GPU jobs).


You can track any disruptions or security advisories that may occur due to the aforementioned change in the Email category on our status reporting portal.

Decommissioning of further CLAIX-2018 nodes

May 8th, 2024 | by

Today, further 287 CLAIX-2018 MPI nodes have been decommissioned.

 


You can track any disruptions or security advisories that may occur due to the aforementioned change in the Email category on our status reporting portal.

Decommissioning of CLAIX-2018

May 3rd, 2024 | by

CLAIX-2018 has reached end of life. The first 516 (empty) nodes have been switched off today (May 3rd, 2024). Over the next few weeks, further systems will gradually be taken out of service. The final decommission of the remaining CLAIX-2018 nodes will take place on May 31st, 2024.

 


You can track any disruptions or security advisories that may occur due to the aforementioned change in the Email category on our status reporting portal.

Decommission of first CLAIX-2018 GPU nodes

April 10th, 2024 | by

We decommissioned 25 GPU nodes CLAIX-2018. We strongly recommend to migrate to the CLAIX-2023 ML nodes as soon as possible.

 


You can track any disruptions or security advisories that may occur due to the aforementioned change in the Email category on our status reporting portal.

Systemmaintenance 17 April 2024

April 10th, 2024 | by

Dear cluster users,

on 17 April 2024, we will carry out a complete maintenance of the cluster. The following points will be processed during maintenance

* new kernel so that the user namespaces can be reactivated, see also [1]
* update of the Infiniband stack of CLAIX23 to stabilise and improve performance
* migration of the HPCWORK directory from lustre18 to the new storage lustre22, see also [2]. In the last few weeks we have started to migrate all HPCWORK data to a new file system. In this maintenance we will perform the last step of the migration. HPCWORK will not be available during this maintenance.
* migration of the old JupyterHub system to a new one

During this maintenance work, the login systems and the batch system will not be available. It is expected that the login systems will reopen in the early morning.

We do not expect the maintenance to last all day, but expect the cluster to open earlier. However, HPCWORK will most likely not be available at this time, the migration must be completed first. Jobs that rely on HPCWORK will complain that they cannot find files. You must therefore stop such jobs and resubmit them at a later date.

[1] https://maintenance.itc.rwth-aachen.de/ticket/status/messages/14/show_ticket/8929
[2] https://maintenance.itc.rwth-aachen.de/ticket/status/messages/14/show_ticket/8960

With kind regards,

Your HPC Team

 


You can track any disruptions or security advisories that may occur due to the aforementioned change in the Email category on our status reporting portal.