Kategorien
Seiten
-

IT Center Changes

Runtime limits for GPU Jobs

09. März 2026 | von

On the 06.03.2026 we changed how many GPU nodes from the c23g partition are allowed to run user GPU Jobs with runtimes longer than 24 hours.
The change has the following main goals:

  • Increase the throughput and reduce the waiting times for GPU jobs with runtimes lower than 24 hours.
  • Encourage users to submit shorter jobs and make use of resilience methods like checkpointing if necessary.
  • Reduce the maintenance downtime of GPU nodes.

The change effectively limits the amount of long running GPU jobs to only half the GPU nodes in the c23g partition. Long running GPU jobs are user GPU jobs with runtimes longer than 24 hours. GPU jobs with less than 24 hours of runtime are considered short and will be scheduled to all available GPU nodes of the c23g partition.
It is understood that the waiting times of long running GPU jobs will increase and we therefore encourage users to change their workflows to accommodate shorter running GPU jobs.
This change is necessary to improve QoS for users of the c23g partition and to allow for faster maintenance works on the GPU nodes.

Die Kommentare sind geschlossen.