Schlagwort: ‘HPC’
Slurm Update for Claix
We updated Slurm to a newer and more stable Version: 25.05.5
This upgrade fixed issues we had with our scheduling system and internal rights-management database.
We improved how Slurm calculates priority for pending Jobs based on user feedback and internal metrics. Here is a short summary with basic details.
In short:
– Job waiting times will be more predictable and intuitive.
– Longer waiting times will increase the priority of a pending jobs.
– Jobs will still be able to access resources quickly if recent resource usage quotas are low.
Details:
– A pending job will not be delayed by any other new jobs (see its expected start time increase into the future) after 24 hours of waiting.
– Note that other software or hardware malfunctions might still cause delays in jobs, but new jobs will no longer cause this after 24 hours.
– A pending job with low fair-share factor might still be delayed by new jobs with higher fair-share factors during the first 24 hours of waiting.
– The project used for a job (default or otherwise) will determine its fair-share priority factor based on recent resource usage.
– Projects that have already used their „fair-share“ of resources, will have a lower fair-share priority factor than projects with lower recent resource usage.
– Priorities and fair-share priority factors only matter for comparing jobs waiting for the same resources (e.g: partitions).
Slurm GPU HPC resource allocation changing on the 01.11.2025
The CLAIX HPC systems will be changing the way GPU resources are requested and allocated starting on the 01.11.2025.
Users submitting Slurm Jobs will no longer be able to request arbitrary amounts of CPU and Memory resources when using GPUs on GPU nodes.
Requesting an entire GPU node’s memory or all CPUs, but only a single GPU will no longer be possible.
Each GPU within a GPU node will have a corresponding strict maximum of CPUs and Memory that can be requested.
To obtain more than the strict maximum of CPUs or Memory per GPU, more GPUs will need to be requested too.
The specific limits per GPU on GPU nodes will be eventually documented separately.
Users are expected to modify their submission scripts or methods accordingly.
This change is driven by our efforts to update the HPC resource billing mechanism to comply with NHR HPC directives.
NHR requires that computing projects apply for CPU and GPU resources independently.
NHR also requires that HPC Centers track the use of these CPU and GPU resources.
The independent resources are then accounted for by Slurm jobs within our CLAIX nodes.
Therefore CPU nodes will only track CPUs (and equivalent memory) and GPU nodes will only track GPUs used.
The quota tools will eventually reflect this too.
Last submissions to Rocky 8 HPC Nodes possible until on July 27th 2025
Starting on July 27th: No further submissions to the Rocky 8 HPC nodes is possible. However, any jobs already in the queue will still run soon on remaining nodes.
All remaining Rocky 8 nodes will be migrated to Rocky 9 as they become available.

