Categories
Pages
-

IT Center Blog

Archer: Error Detection for HPC

July 12th, 2024 | by

The world of high-performance computing is developing rapidly and computing capacities are reaching new heights. In this context, tools for error detection and correction are essential to ensure the reliability of applications. One such tool is Archer, which was specially developed to detect data races in OpenMP programmes.

 

 

Background and Necessity

With the increase in computing power, many software components need to be parallelised to achieve maximum efficiency. OpenMP has established itself as the preferred model for implementing this parallelisation within a computing node, as it is both portable and user-friendly. A new source of error when parallelising with OpenMP is the critical race, which complicates the development and maintenance of such programs. In this context, a critical race (or data race) refers to a problem that can occur when multiple computer processes or threads access and modify the same data in an unsynchronised manner. In this case, the behaviour of the program is not defined, which can lead to unexpected and often erroneous results.

 

The Challenge

Traditional OpenMP tools that search for data races could not keep up with modern, large and complex HPC applications. As a result, development was inefficient and developers were sometimes forced to use the simpler sequential code to avoid hard-to-detect errors. One example of this is the parallelisation of the HYDRA application. A data race in the application only led to spontaneous crashes when it was scaled up to a full HPC machine. The undetected error delayed the successful parallelisation with OpenMP by six months.

 

Archer: A Solution

Archer has been developed since 2014 to recognise these problems. It is a highly scalable and precise tool for detecting data races in OpenMP programmes. Archer builds on the existing open source tool ThreadSanitizer (TSan) and extends its capabilities to specifically address the requirements of OpenMP programmes. The vectorised race-checking analysis offers high scalability. The dynamic analysis leads to high accuracy and a very low false detection rate. Information on the use of Archer can be found in the HPC Wiki.

Scalable methods for tracking Happens-Before:
The central criterion for a data race is the lack of synchronisation between competing memory accesses. Archer utilises the architecture of TSan to perform efficient verification of data races. This enables the handling of large production programmes in OpenMP.

Modular interfaces to OpenMP runtimes:
Archer has been designed to integrate modularly with various OpenMP runtimes. This is made possible by compliance with the “OpenMP Tools Interface” standard (OMPT).

Collaboration with active projects:
Archer has already proven itself in real applications. In addition to the aforementioned problem in HYDRA, errors in various parallel applications were identified and eliminated during a routine check as part of a POP performance audit.

 

Intel and the Development of Archer

A few years ago, Intel began converting its own compiler architecture to LLVM (Low-Level Virtual Machine). As a result, Intel has no longer supported its own tool for recognising programming errors, Intel Inspector, since the beginning of the year.

In order to further increase the robustness and security of applications, the sanitisers from the LLVM project are to be used with immediate effect. These tools help to recognise and rectify memory errors, data races and other problems in software development at an early stage. Thanks to the change in compiler architecture, these tools are now also available in the new generation of Intel compilers (e.g. icx).

The IT Center has played a leading role in the development of Archer since the beginning and has been working for years on the integration and improvement of TSan in order to adapt it to the requirements of the modern HPC environment. Since 2018, the tool has been an integral part of the LLVM project and is being further developed there. We are delighted that it is now becoming even more widespread thanks to its integration into the Intel Compiler.

You can find details for using Archer in our HPC-Wiki.


Responsible for the content of this article are Joachim Jenke, Simon Schwitanski, Malak Mostafa and Janin Iglauer.

Leave a Reply

Your email address will not be published. Required fields are marked *