Categories
Pages
-

Welcome to the PADS Blog!

Archive for January, 2020

MicroPM4Py – Process Mining in Resource-Constrained Environments

January 31st, 2020 | by

This post is by Alessandro Berti, Software Engineer in the Process And Data Science group at RWTH Aachen University. Contact him via email for further inquiries.

Process mining is a branch of data science that includes a wide set of different techniques for automated process discovery, compliance/conformance checking, process simulation and prediction. The tools support for process mining has developed in the recent years in different directions: standalone tools, libraries for the most known programming languages, cloud solutions that store and analyze event data. All of these require the support of different standards, numeric calculus and optimization techniques, visualizations. Examples are the XES standard (XML) for the storage of event logs, the support of Petri nets and their importing/exporting from PNML (XML) files, the inclusion of LP/ILP solvers for process discovery (ILP-based process discovery) and conformance checking (A* alignments).

The inclusion of a vast set of functionalities in the more advanced tools/libraries (like ProM 6.x or PM4Py) has therefore led to an uncontrollable growth in the amount of memory required to run the tool/library. As per now, simply opening ProM 6.9 requires more than 650 MB of RAM, and importing PM4Py along with its dependencies requires more than 80 MB of RAM. On the other hand, cloud solutions such as the Celonis IBC or MyInvenio require an active Internet connection for the transmission of event data and the delivery of process mining analysis/dashboards.

This cuts the possibility to apply process mining on the most numerous class of computers in the world: the microcontrollers. As example, a modern car contains from 30 to 70 microcontrollers. Microcontrollers are usually adopted to control the operational status of a system (for example, the acceleration, temperature, magnetic field, humidity). For minimizing idle/light sleep power consumption, these come with a very limited amount of RAM (usually from 32 KB to 512 KB) and a very low power CPU (examples are the Cortex M0/M4/M7 CPUs for microcontrollers). Applying existing process mining tools/libraries on these is simply impossible.

Hence, a new library (MicroPM4Py) is being developed in the MicroPython language (www.micropython.org), that is a complete reimplementation for microcontrollers of a basic set of features of Python 3.4. The aim of the MicroPython language is not being faster than the normal Python (indeed, Python is much faster than MicroPython for the same script) but to minimize the memory footprint of the application. The goal of MicroPM4Py is to enable some process mining features directly at the microcontroller level:

  • Full support for Petri nets without invisible transitions: a memory-efficient data structure is deployed that supports the semantics of a Petri net, the verification of the fitness of a trace, PNML importing/exporting.
  • Basic support (A1 import; A1 export) for the XES standard for event logs: the traces can be loaded from the XES event log with different modalities (full DFG; distinct variants; full list of traces in memory; iterator trace by trace).
  • Basic support for importing/exporting logs in the CSV format.
  • Basic support for process discovery (discovery of a DFG, DFG mining, Alpha Miner algorithm) on top of XES/CSV event logs
  • Generation of the DOT (Graphviz) visualization of a DFG / Petri net.

All the data structures have been optimized in order to minimize the memory consumption. As example, the following table estimates the RAM occupation (in bytes) of the three log structures in MicroPM4Py reading the DFG from the XES; reading the variants; reading all the traces from the log. Generally, the numbers are competitive and among the most efficient XES importers ever done (remind the limitation: only the case ID and the activity is read).

Log name DFG obj size Variants obj size Loaded log obj size
running-example 3200 1568 2016
receipt 17112 22072 213352
roadtraffic 10696 37984 19392280
LevelA1.xes 6456 16368 154792
BPIC17.xes 29240 7528144 11474448
BPIC15_4.xes 453552 488200 587656
BPIC17 – Offer log.xes 3768 2992 6051880
BPIC13_incidents.xes 2696 380216 1333200
BPIC15_3.xes 600304 600312 743656
BPIC15_1.xes 541824 534216 652696
BPIC15_5.xes 555776 589688 703712
BPIC12.xes 20664 1819848 3352352
BPIC13_closed_problems.xes 2320 33504 226856
BPIC15_2.xes 551656 457496 531176
BPIC13_open_problems.xes 1872 16096 122672

The following table estimates the memory usage of an use case of MicroPM4Py: from a log, the DFG is obtained and the DFG mining technique is applied to obtain a Petri net. Then, an iterator is created on the log, in order to iterate over the single traces of the log. The memory usage of the MicroPM4Py module and data structure is then measured. The following aspects are taken into account (estimations were done on a X86-64 Debian 9 with Miniconda Python 3.7):

  • The maximum size of the iterator+current trace (in bytes)
  • The size of the DFG-mined Petri net (in bytes)
  • The size of the MicroPM4Py Python module (in bytes)
  • An estimated overapproximated size (16 KB) of a kernel + Micropython interpeter running on a microcontroller (in bytes)

The values are summed to get an estimation of the memory usage of the application for such use case, for several real-life logs. Except for the BPIC 2015 logs, the memory allocation is always under 128 KB! 🙂

Log name Max XES iterable size DFG mining net size MicroPM4Py module size MicroPython size (est.max.) MAX EST. SIZE
running-example 2512 6384 20624 16384 45904
receipt 6312 27328 20624 16384 70648
roadtraffic 3416 18104 20624 16384 58528
LevelA1.xes 3656 9952 20624 16384 50616
BPIC17.xes 6664 41352 20624 16384 85024
BPIC15_4.xes 49560 757680 20624 16384 844248
BPIC17 – Offer log.xes 2392 6760 20624 16384 46160
BPIC13_incidents.xes 2696 5352 20624 16384 45056
BPIC15_3.xes 49912 1084872 20624 16384 1171792
BPIC15_1.xes 51592 993800 20624 16384 1082400
BPIC15_5.xes 54048 1027648 20624 16384 1118704
BPIC12.xes 6368 30744 20624 16384 74120
BPIC13_closed_problems.xes 2056 5160 20624 16384 44224
BPIC15_2.xes 54080 1012512 20624 16384 1103600
BPIC13_open_problems.xes 1864 4480 20624 16384 43352

While it is impossible on such level (microcontrollers) to support the wide set of features of other tools, it is still possible to apply some process mining algorithms on top of microcontrollers. MicroPM4Py can also be deployed on old workstations or other kinds of low-power computers (such as the Raspberry Pis).

As future work, the library will include some other process models (e.g. transition systems, NFA, continuous time markov chains).