MicroPM4Py – Process Mining in Resource-Constrained Environments

January 31st, 2020 | by Pegoraro, Marco

This post is by Alessandro Berti, Software Engineer in the Process And Data Science group at RWTH Aachen University. Contact him via email for further inquiries.

Process mining is a branch of data science that includes a wide set of different techniques for automated process discovery, compliance/conformance checking, process simulation and prediction. The tools support for process mining has developed in the recent years in different directions: standalone tools, libraries for the most known programming languages, cloud solutions that store and analyze event data. All of these require the support of different standards, numeric calculus and optimization techniques, visualizations. Examples are the XES standard (XML) for the storage of event logs, the support of Petri nets and their importing/exporting from PNML (XML) files, the inclusion of LP/ILP solvers for process discovery (ILP-based process discovery) and conformance checking (A* alignments).

The inclusion of a vast set of functionalities in the more advanced tools/libraries (like ProM 6.x or PM4Py) has therefore led to an uncontrollable growth in the amount of memory required to run the tool/library. As per now, simply opening ProM 6.9 requires more than 650 MB of RAM, and importing PM4Py along with its dependencies requires more than 80 MB of RAM. On the other hand, cloud solutions such as the Celonis IBC or MyInvenio require an active Internet connection for the transmission of event data and the delivery of process mining analysis/dashboards.

This cuts the possibility to apply process mining on the most numerous class of computers in the world: the microcontrollers. As example, a modern car contains from 30 to 70 microcontrollers. Microcontrollers are usually adopted to control the operational status of a system (for example, the acceleration, temperature, magnetic field, humidity). For minimizing idle/light sleep power consumption, these come with a very limited amount of RAM (usually from 32 KB to 512 KB) and a very low power CPU (examples are the Cortex M0/M4/M7 CPUs for microcontrollers). Applying existing process mining tools/libraries on these is simply impossible.

Hence, a new library (MicroPM4Py) is being developed in the MicroPython language (www.micropython.org), that is a complete reimplementation for microcontrollers of a basic set of features of Python 3.4. The aim of the MicroPython language is not being faster than the normal Python (indeed, Python is much faster than MicroPython for the same script) but to minimize the memory footprint of the application. The goal of MicroPM4Py is to enable some process mining features directly at the microcontroller level:

Full support for Petri nets without invisible transitions: a memory-efficient data structure is deployed that supports the semantics of a Petri net, the verification of the fitness of a trace, PNML importing/exporting.
Basic support (A1 import; A1 export) for the XES standard for event logs: the traces can be loaded from the XES event log with different modalities (full DFG; distinct variants; full list of traces in memory; iterator trace by trace).
Basic support for importing/exporting logs in the CSV format.
Basic support for process discovery (discovery of a DFG, DFG mining, Alpha Miner algorithm) on top of XES/CSV event logs
Generation of the DOT (Graphviz) visualization of a DFG / Petri net.

All the data structures have been optimized in order to minimize the memory consumption. As example, the following table estimates the RAM occupation (in bytes) of the three log structures in MicroPM4Py reading the DFG from the XES; reading the variants; reading all the traces from the log. Generally, the numbers are competitive and among the most efficient XES importers ever done (remind the limitation: only the case ID and the activity is read).

Log name	DFG obj size	Variants obj size	Loaded log obj size
running-example	3200	1568	2016
receipt	17112	22072	213352
roadtraffic	10696	37984	19392280
LevelA1.xes	6456	16368	154792
BPIC17.xes	29240	7528144	11474448
BPIC15_4.xes	453552	488200	587656
BPIC17 – Offer log.xes	3768	2992	6051880
BPIC13_incidents.xes	2696	380216	1333200
BPIC15_3.xes	600304	600312	743656
BPIC15_1.xes	541824	534216	652696
BPIC15_5.xes	555776	589688	703712
BPIC12.xes	20664	1819848	3352352
BPIC13_closed_problems.xes	2320	33504	226856
BPIC15_2.xes	551656	457496	531176
BPIC13_open_problems.xes	1872	16096	122672

The following table estimates the memory usage of an use case of MicroPM4Py: from a log, the DFG is obtained and the DFG mining technique is applied to obtain a Petri net. Then, an iterator is created on the log, in order to iterate over the single traces of the log. The memory usage of the MicroPM4Py module and data structure is then measured. The following aspects are taken into account (estimations were done on a X86-64 Debian 9 with Miniconda Python 3.7):

The maximum size of the iterator+current trace (in bytes)
The size of the DFG-mined Petri net (in bytes)
The size of the MicroPM4Py Python module (in bytes)
An estimated overapproximated size (16 KB) of a kernel + Micropython interpeter running on a microcontroller (in bytes)

The values are summed to get an estimation of the memory usage of the application for such use case, for several real-life logs. Except for the BPIC 2015 logs, the memory allocation is always under 128 KB! 🙂

Log name	Max XES iterable size	DFG mining net size	MicroPM4Py module size	MicroPython size (est.max.)	MAX EST. SIZE
running-example	2512	6384	20624	16384	45904
receipt	6312	27328	20624	16384	70648
roadtraffic	3416	18104	20624	16384	58528
LevelA1.xes	3656	9952	20624	16384	50616
BPIC17.xes	6664	41352	20624	16384	85024
BPIC15_4.xes	49560	757680	20624	16384	844248
BPIC17 – Offer log.xes	2392	6760	20624	16384	46160
BPIC13_incidents.xes	2696	5352	20624	16384	45056
BPIC15_3.xes	49912	1084872	20624	16384	1171792
BPIC15_1.xes	51592	993800	20624	16384	1082400
BPIC15_5.xes	54048	1027648	20624	16384	1118704
BPIC12.xes	6368	30744	20624	16384	74120
BPIC13_closed_problems.xes	2056	5160	20624	16384	44224
BPIC15_2.xes	54080	1012512	20624	16384	1103600
BPIC13_open_problems.xes	1864	4480	20624	16384	43352

While it is impossible on such level (microcontrollers) to support the wide set of features of other tools, it is still possible to apply some process mining algorithms on top of microcontrollers. MicroPM4Py can also be deployed on old workstations or other kinds of low-power computers (such as the Raspberry Pis).

As future work, the library will include some other process models (e.g. transition systems, NFA, continuous time markov chains).

Kategorie: Allgemein
Optionen: Antworten ist derzeit nicht möglich | Trackback senden «

Comments are closed.

Welcome to the PADS Blog!

MicroPM4Py – Process Mining in Resource-Constrained Environments

Recent Posts

Archives

Categories