This post is by Alessandro Berti, Software Engineer in the Process And Data Science group at RWTH Aachen University. Contact him via email for further inquiries.
In process mining projects, a vast amount of time is spent on the ETL phase, which is related to the extraction of the event logs and the process models. Process models can be discovered from event logs by applying a classic Process Mining technique (like the Inductive Miner). An event log is organized in cases and events, where a case groups events that are related to the same instance of the process. However, obtaining an event log from a database is a tricky process and requires the specification of a case notion, so a set of attributes/columns that group the events into cases. Specifying a case notion is often non-trivial:
- It requires a deep knowledge of the process(es) saved in the database
- It often requires to join several entities, making the extraction slow
Without expertise from both process and IT worlds, it seems difficult to extract an event log from a database. In the last years, some research has been done in order to simplify the extraction process, and we can cite some papers:
- Calvanese, Diego, et al. “Ontology-driven extraction of event logs from relational databases.” International Conference on Business Process Management. Springer, Cham, 2015. In this paper, a way to translate SPARQL queries, that are easier to express, into SQL queries is provided.
- Li, Guangming, Renata Medeiros de Carvalho, and Wil MP van der Aalst. “Automatic discovery of object-centric behavioral constraint models.” International Conference on Business Information Systems. Springer, Cham, 2017. In this paper, a modeling technique on-top of databases is presented, that is able to discover some interesting patterns for conformance checking application.
- de Murillas, Eduardo Gonzalez Lopez, et al. “Audit Trails in OpenSLEX: Paving the Road for Process Mining in Healthcare.” International Conference on Information Technology in Bio-and Medical Informatics. Springer, Cham, 2017. In this paper, a meta-model where database events can be inserted is presented. This meta-model could be queried using SQL language and the complexity of event log retrieval is decreased.
The PADS team developed StarStar models, that are a novel way to represent event data on top of relational databases. The main view provided by StarStar models is the A2A (activities) multigraph, in which activities can be connected by several edges.
An edge in the A2A multigraph is associated to a class perspective, and is inferred observing directly-follows relationships between events in the perspective of some object belonging to the given class. StarStar models provide also a drill-down functionality where, given a class perspective, a classic event log to use with mainstream process mining techniques can be obtained.
StarStar models seems to be very promising for process visualization, since the edge can be annotated by the frequency or the performance (along with the class perspective for which the edge was detected).