This post is by Alessandro Berti, Software Engineer in the Process And Data Science group at RWTH Aachen University. Contact him via email for further inquiries.
Process mining is a branch of data science that includes a wide set of different techniques for automated process discovery, compliance/conformance checking, process simulation and prediction. The tools support for process mining has developed in the recent years in different directions: standalone tools, libraries for the most known programming languages, cloud solutions that store and analyze event data. All of these require the support of different standards, numeric calculus and optimization techniques, visualizations. Examples are the XES standard (XML) for the storage of event logs, the support of Petri nets and their importing/exporting from PNML (XML) files, the inclusion of LP/ILP solvers for process discovery (ILP-based process discovery) and conformance checking (A* alignments).
The inclusion of a vast set of functionalities in the more advanced tools/libraries (like ProM 6.x or PM4Py) has therefore led to an uncontrollable growth in the amount of memory required to run the tool/library. As per now, simply opening ProM 6.9 requires more than 650 MB of RAM, and importing PM4Py along with its dependencies requires more than 80 MB of RAM. On the other hand, cloud solutions such as the Celonis IBC or MyInvenio require an active Internet connection for the transmission of event data and the delivery of process mining analysis/dashboards.
This cuts the possibility to apply process mining on the most numerous class of computers in the world: the microcontrollers. As example, a modern car contains from 30 to 70 microcontrollers. Microcontrollers are usually adopted to control the operational status of a system (for example, the acceleration, temperature, magnetic field, humidity). For minimizing idle/light sleep power consumption, these come with a very limited amount of RAM (usually from 32 KB to 512 KB) and a very low power CPU (examples are the Cortex M0/M4/M7 CPUs for microcontrollers). Applying existing process mining tools/libraries on these is simply impossible.
Hence, a new library (MicroPM4Py) is being developed in the MicroPython language (www.micropython.org), that is a complete reimplementation for microcontrollers of a basic set of features of Python 3.4. The aim of the MicroPython language is not being faster than the normal Python (indeed, Python is much faster than MicroPython for the same script) but to minimize the memory footprint of the application. The goal of MicroPM4Py is to enable some process mining features directly at the microcontroller level:
- Full support for Petri nets without invisible transitions: a memory-efficient data structure is deployed that supports the semantics of a Petri net, the verification of the fitness of a trace, PNML importing/exporting.
- Basic support (A1 import; A1 export) for the XES standard for event logs: the traces can be loaded from the XES event log with different modalities (full DFG; distinct variants; full list of traces in memory; iterator trace by trace).
- Basic support for importing/exporting logs in the CSV format.
- Basic support for process discovery (discovery of a DFG, DFG mining, Alpha Miner algorithm) on top of XES/CSV event logs
- Generation of the DOT (Graphviz) visualization of a DFG / Petri net.
All the data structures have been optimized in order to minimize the memory consumption. As example, the following table estimates the RAM occupation (in bytes) of the three log structures in MicroPM4Py reading the DFG from the XES; reading the variants; reading all the traces from the log. Generally, the numbers are competitive and among the most efficient XES importers ever done (remind the limitation: only the case ID and the activity is read).
Log name | DFG obj size | Variants obj size | Loaded log obj size |
running-example | 3200 | 1568 | 2016 |
receipt | 17112 | 22072 | 213352 |
roadtraffic | 10696 | 37984 | 19392280 |
LevelA1.xes | 6456 | 16368 | 154792 |
BPIC17.xes | 29240 | 7528144 | 11474448 |
BPIC15_4.xes | 453552 | 488200 | 587656 |
BPIC17 – Offer log.xes | 3768 | 2992 | 6051880 |
BPIC13_incidents.xes | 2696 | 380216 | 1333200 |
BPIC15_3.xes | 600304 | 600312 | 743656 |
BPIC15_1.xes | 541824 | 534216 | 652696 |
BPIC15_5.xes | 555776 | 589688 | 703712 |
BPIC12.xes | 20664 | 1819848 | 3352352 |
BPIC13_closed_problems.xes | 2320 | 33504 | 226856 |
BPIC15_2.xes | 551656 | 457496 | 531176 |
BPIC13_open_problems.xes | 1872 | 16096 | 122672 |
The following table estimates the memory usage of an use case of MicroPM4Py: from a log, the DFG is obtained and the DFG mining technique is applied to obtain a Petri net. Then, an iterator is created on the log, in order to iterate over the single traces of the log. The memory usage of the MicroPM4Py module and data structure is then measured. The following aspects are taken into account (estimations were done on a X86-64 Debian 9 with Miniconda Python 3.7):
- The maximum size of the iterator+current trace (in bytes)
- The size of the DFG-mined Petri net (in bytes)
- The size of the MicroPM4Py Python module (in bytes)
- An estimated overapproximated size (16 KB) of a kernel + Micropython interpeter running on a microcontroller (in bytes)
The values are summed to get an estimation of the memory usage of the application for such use case, for several real-life logs. Except for the BPIC 2015 logs, the memory allocation is always under 128 KB! 🙂
Log name | Max XES iterable size | DFG mining net size | MicroPM4Py module size | MicroPython size (est.max.) | MAX EST. SIZE |
running-example | 2512 | 6384 | 20624 | 16384 | 45904 |
receipt | 6312 | 27328 | 20624 | 16384 | 70648 |
roadtraffic | 3416 | 18104 | 20624 | 16384 | 58528 |
LevelA1.xes | 3656 | 9952 | 20624 | 16384 | 50616 |
BPIC17.xes | 6664 | 41352 | 20624 | 16384 | 85024 |
BPIC15_4.xes | 49560 | 757680 | 20624 | 16384 | 844248 |
BPIC17 – Offer log.xes | 2392 | 6760 | 20624 | 16384 | 46160 |
BPIC13_incidents.xes | 2696 | 5352 | 20624 | 16384 | 45056 |
BPIC15_3.xes | 49912 | 1084872 | 20624 | 16384 | 1171792 |
BPIC15_1.xes | 51592 | 993800 | 20624 | 16384 | 1082400 |
BPIC15_5.xes | 54048 | 1027648 | 20624 | 16384 | 1118704 |
BPIC12.xes | 6368 | 30744 | 20624 | 16384 | 74120 |
BPIC13_closed_problems.xes | 2056 | 5160 | 20624 | 16384 | 44224 |
BPIC15_2.xes | 54080 | 1012512 | 20624 | 16384 | 1103600 |
BPIC13_open_problems.xes | 1864 | 4480 | 20624 | 16384 | 43352 |
While it is impossible on such level (microcontrollers) to support the wide set of features of other tools, it is still possible to apply some process mining algorithms on top of microcontrollers. MicroPM4Py can also be deployed on old workstations or other kinds of low-power computers (such as the Raspberry Pis).
As future work, the library will include some other process models (e.g. transition systems, NFA, continuous time markov chains).