The Quest For Efficient Stochastic Conformance Checking

January 28th, 2025 | by Wetzeler, Detlef

This post has been authored by Eduardo Goulart Rocha.

Conformance Checking is a key field of process mining. At its heart, conformance checking deals with two questions:

How is a process deviating from its ideal flow?
How severe are these deviations?

To exemplify that, we consider a simplified hiring process inside a company and two event logs depicting its executions in two distinct business units (in the Netherlands and in Germany):

Event logs for a hiring process in a fictitious company’s Dutch (left) and German (right) business units

The log contains a few violations. In this case, some applications are reviewed multiple times, and interviews are sometimes conducted before checking for an applicant’s backgrounds. These can be detected using state of the art conformance checking techniques [1]. A process owner may decide that these violations are acceptable and update the reference model to allow for that, leading to the following model:

Now, both event logs have the same set of variants and both achieve an alignment-based fitness and precision of 1. However, both logs are not the same and we intuitively know which event log is preferred. Repeated CV screenings drain manual resources and should be minimized. Additionally, interviews are more effectively conducted after backgrounds are checked (as more information on the candidate can be collected).

Why Stochastic Conformance Checking

The dilemma above serves as starting point for stochastic conformance checking. While all flows are permitted, some are less desirable. Therefore, we would like to capture what is the preferred behavior of a model and leverage this information when evaluating an event log. In the literature, Stochastic Labeled Petri Nets are used for that. These add weights on top of traditional labeled Petri nets that should be interpreted as “whenever a set of transitions is enabled in a marking, then each enabled transition fires with probability proportional to its weight”. Suppose we assign weights as follows:

This makes it clearer that while repeated reviews are possible, these should be the exception. And that the interview should be preferably conducted after checking for references. This assigns ideal relative frequencies (probabilities) to each trace variant as follows:Now, it is clear that the Dutch business unit is more conforming.

State of the Art in Stochastic Conformance Checking

In its simples form, stochastic conformance checking aims at quantifying deviations considering a process model’s stochastic perspective. An ideal stochastic conformance measure should present three properties:

It is robust to partial mismatches
It can be efficiently and exactly computed for a broad class of stochastic languages
It considers the log and model’s stochastic perspective

In recent years, multiple stochastic conformance measures have been proposed [2-6]. Unfortunately, state of the art measures fall short of one or more of these tasks. The table below summarizes their shortcomings:

Latest Development

In a recent work presented at the ICPM 2024 [7], we made a small step to improve on that. The main idea is to abstract the model and log’s stochastic languages into an N-gram-like model (called its K-th order Markovian abstraction) that represents the relative frequency of each subtrace in the language. In our running example, when k = 2 we obtain:

Model and Logs abstractions: The relative frequency of each subtrace in their respective languages

RA = Review Application, CR = Check References, I = Interview

This abstraction can then be compared using any existing stochastic conformance measure as illustrated in the framework below:

By using the language’s subtraces (instead of full-traces), measures based on this abstraction are naturally more robust to partial mismatches in the data. Furthermore, in REF we also show that this abstraction can be efficiently computed for bounded livelock-free stochastic labeled Petri nets. Last, the model’s abstraction does not depend on sampling and considers the model’s full behavior.

Outlook

While this was some progress, there is still much work to be done in the field. First, the proposed abstraction cannot handle long-term dependencies. Second, we would like to provide diagnostics beyond a single number as feedback to the end-user. Efficient and easy to use conformance methods are imperative for the development of stochastic process mining.

References

Arya Adriansyah, Boudewijn F. van Dongen, Wil M. P. van der Aalst: Conformance Checking Using Cost-Based Fitness Analysis. EDOC 2011: 55-64
Sander J. J. Leemans, Wil M. P. van der Aalst, Tobias Brockhoff, Artem Polyvyanyy: Stochastic process mining: Earth movers’ stochastic conformance. Inf. Syst. 102: 101724 (2021)
Sander J. J. Leemans, Fabrizio Maria Maggi, Marco Montali: Enjoy the silence: Analysis of stochastic Petri nets with silent transitions. Inf. Syst. 124: 102383 (2024)
Sander J. J. Leemans, Artem Polyvyanyy: Stochastic-Aware Conformance Checking: An Entropy-Based Approach. CAiSE 2020: 217-233
Artem Polyvyanyy, Alistair Moffat, Luciano García-Bañuelos: An Entropic Relevance Measure for Stochastic Conformance Checking in Process Mining. ICPM 2020: 97-104
Tian Li, Sander J. J. Leemans, Artem Polyvyanyy: The jensen-shannon distance metric for stochastic conformance checking. ICPM Workshops 2024
Eduardo Goulart Rocha, Sander J. J. Leemans, Wil M. P. van der Aalst: Stochastic Conformance Checking Based on Expected Subtrace Frequency. ICPM 2024: 73-80

Kategorie: Allgemein
Optionen: Antworten ist derzeit nicht möglich | Trackback senden «

Comments are closed.

Welcome to the PADS Blog!

The Quest For Efficient Stochastic Conformance Checking

Recent Posts

Archives

Categories