Categories
Pages
-

IT Center Blog

aiXcelerate 2021 – I/O and Machine Learning in Spotlight

December 20th, 2021 | by

Source: Own illustration

aiXcelerate is an annual tuning workshop for HPC users. It comprises lectures that are open for everyone, and hands-on parts where registered groups apply the learnt concepts to their own codes. Every year, we focus on different hot topics. This year, on December 6th to 9th, aiXcelerate focused on performance tuning for I/O-intensive workloads and machine-learning applications. Due to the ongoing pandemic, aiXcelerate took place in an online format. Presentations were done live via Zoom and questions and discussions were fostered using a dedicated Slack channel. We attracted more than 20 participants every day.

I/O

Understanding and improving I/O behavior of scientific simulations is non-trivial as it requires knowledge not only about the internals of a simulation code, but also about the file systems of a given HPC system. The I/O part therefore spanned a wide spectrum of information, from available file systems and their configuration and expected behavior to details on the use of available I/O libraries. Similarly, the existing knowledge and interests of the audience on I/O topics was broad as well. During the different presentations, the audience was well engaged, asking questions both via Zoom and the Slack discussion rooms.

ML

Machine Learning (ML), Deep Learning (DL) and Artificial Intelligence (AI) are becoming ever-more important for researchers of all fields. This part of the workshop focused on giving users of the RWTH Compute Cluster the tools they need to run and optimize their code and workflow. Instead of diving deep into theoretical backgrounds of ML and DL, the attendees were provided with practical examples and best practices using common software frameworks like scikit-learn, TensorFlow and PyTorch, illustrating how to run them efficiently on CLAIX with virtual environments or containers and what to look out for. This also included a case study on how to identify sub-optimal code and how to fix it by running an in-depth performance analysis. Finally, this part was complemented with hands-on exercises demonstrating how speed up training and evaluation using distributed Machine and Deep Learning across multiple GPUs as well as compute nodes.

Bring-your-own (BYO) Code

One specialty of our aiXcelerate workshops is the “BYO code” part. It means that HPC users can bring their own software codes to the workshop. Each user (group) works together with HPC experts and dedicated mentors from the IT Center during the workshop to investigate the performance of the BYO code on the RWTH’s Compute Cluster CLAIX. To be part of this experience, HPC users had to apply for BYO explicitly and provide specific code details. This year, we worked intensively with a number of users and helped them in properly deploying their code on CLAIX while focusing on I/O and ML behavior. All BYO participants gave very positive feedback and observed improvements in their compute jobs.

Responsible for the content of this article Sandra Wienke.

Comments are closed.