
Source: Own Illustration
In parallel programming with supercomputers such as CLAIX, the so-called Message Passing Interface has been used for over 30 years for efficient communication between processes. Programming errors in MPI communication can lead to typical problems in software development such as deadlocks, data races or type errors – especially due to parallelized execution. However, such errors are often difficult for the developer to find, especially if an error occurs non-deterministically, i.e., only sporadically.
In order to support developers in detecting errors in MPI programming, the MPI correctness analysis tool MUST has been developed at the IT Center for over 10 years. It can detect various types of MPI programming errors, from simple argument errors to the aforementioned deadlocks and data races, and report them to the user. MUST attaches itself to the execution of the program and can detect the errors at runtime. We have already reported on the basic functionality of MUST in a previous blog post. MUST is available to all interested users under an open source license (BSD-3).
The new version 1.11.0 of MUST was released at the beginning of the month, adding further functionalities, which we present and explain in compact form below.
Analysis of MPI One-Sided Communication (RMA)
There are different types of message exchange in MPI. On the one hand, there is classic “message passing”, in which a sender sends its data to a recipient who actively receives the message. This type of communication is two-sided because both sender and receiver are involved in the communication. In one-sided communication (MPI One-Sided Communication), the “sending” process can write the data directly into the memory of the target process. In other words, the target process no longer has to actively receive the data. This one-sided communication can offer efficiency advantages, but also harbors the risk of various problems such as data races and deadlocks.
Until now, MPI One-Sided Communication had only basic support in MUST. With the new release, MUST can now also detect simple argument errors as well as data races and deadlocks for this type of communication. This further expands the portfolio of MPI programs that can be analyzed.
Analysis of Neighborhood Collectives and I/O
In MPI, data exchange is also possible with so-called “collectives”. With these, multiple processes can exchange data at the same time. For example, one process can send data to all other processes. This can be more efficient than an individual exchange via messages between the processes. A newer functionality is the so-called “Neighborhood Collectives”, which simplifies the exchange between logically “neighboring” processes. MUST now also supports the analysis of this data exchange with regard to deadlocks and so-called type errors. Type errors occur, for example, when a process sends data with a different data type (e.g. “integer”) and receives another incompatible data type (e.g. “float”).
In addition to the exchange of data between processes, MPI also enables the permanent storage of data on a given file system. This functionality is called “MPI I/O”. With the new MUST release, this kind of storing data can also be analyzed with MUST.
Improvements in the Detection of Type Errors with TypeART
MUST also interacts with other tools for correctness analysis. One of these is TypeART, which analyzes memory allocations in C/C++ and looks at what data type has been assigned to them. Using this type information from TypeART, MUST can find further inconsistencies between C/C++ data types and MPI data types. In the new release, TypeART support has been extended for newer compiler versions and expanded in MUST to include more complex data types.
Further Information
You can find out more about MUST and its development on the website. From there you can download and install it. It is already installed on our HPC cluster CLAIX and available to all users.
Responsible for the content of this article is Simon Schwitanski.
Leave a Reply