Categories
Pages
-

Welcome to the PADS Blog!

Can AI Really Model Your Business Processes? A Deep Dive into LLMs and BPM

February 13th, 2025 | by

This post has been authored by Humam Kourani.

Business process modeling (BPM) is crucial for understanding, analyzing, and improving how a company operates. Traditionally, this has involved painstaking manual work. But what if Artificial Intelligence could lend a hand? Large Language Models (LLMs) are showing promise in this area, offering the potential to automate and enhance the BPM process. Let’s dive into how LLMs are being used, how effective they can be, and what the research shows.

What is Process Modeling (and Why Does it Matter)?

Before we get to the AI, let’s quickly recap the basics. Process modeling is all about representing the steps, actions, and interactions within a business process. The goal is to:

  • Understand: Make complex operations clear and visible.
  • Analyze: Identify bottlenecks, inefficiencies, and areas for improvement.
  • Improve: Optimize workflows for better performance, reduced costs, and increased customer satisfaction.

Enter the LLMs: AI-Powered Process Modeling

The core idea is to leverage the power of LLMs to automatically generate and refine process models based on natural language descriptions. Imagine simply describing your process in plain text, and having an AI create a BPMN diagram for you! The general framework involves:

  1. Starts with Natural Language: You describe the process in words.
  2. Uses POWL as an Intermediate: The LLM translates the description into the Partially Ordered Workflow Language (POWL).
  3. Generates Standard Models: The POWL representation is then converted into standard notations like BPMN or Petri nets.

Figure 1: AI-Powered process modeling using POWL for intermediate representation.

Why POWL for intermediate representation?

The Partially Ordered Workflow Language (POWL) [1] serves as a crucial bridge in our AI-powered process modeling framework. Unlike some traditional modeling notations, POWL is designed with a hierarchical, semi-block structure that inherently guarantees soundness. This means we can avoid common modeling errors like deadlocks. Furthermore, POWL has a higher expressive power compared to hierarchical modeling languages that provide similar quality guarantees. The resulting POWL models can be seamlessly converted into standard notations like BPMN and Petri nets for wider use.

Fine-Tuning vs. Prompt Engineering: A Key Choice

A fundamental question in working with LLMs is how to best tailor them to a specific task:

  • Fine-Tuning: Retraining the LLM on a specific dataset. This is highly tailored but expensive and requires significant data.
  • Prompt Engineering: Crafting clever prompts to guide the LLM. This is more adaptable and versatile but requires skill in prompt design.

The LLM-Based Process Modeling Framework: A Closer Look

Our framework [2] is iterative and involves several key steps:

1. Prompt Engineering: This includes the following strategies:

  • Knowledge Injection: Providing the LLM with specific information about POWL and how to generate POWL models.
  • Few-Shots Learning: Giving the LLM examples of process descriptions and their corresponding models.
  • Negative Prompting: Telling the LLM what not to do, avoiding common errors.

2.Model Generation: The LLM generates executable code (in this case, representing the POWL model). This code is then validated for correctness and compliance with the coding guidelines and the POWL specifications.

3.Error Handling:The system detects errors (both critical functional errors and less critical qualitative issues) and prompts the LLM to fix them.

4.Model Refinement: Users can provide feedback in natural language, and the LLM uses this feedback to improve the model.

Figure 2: LLM-Based process modeling framework.

ProMoAI: A Practical Tool

ProMoAI [3] is a tool that implements this framework. Key features of ProMoAI include:

  • Support for Multiple LLMs: It can work with LLMs from various AI providers, including Google, OpenAI, DeepSeek, Anthropic, and DeepInfra.
  • Flexible Input: Users can input text descriptions, existing models, or data.
  • Multiple Output Views: It can generate models in BPMN, POWL, and Petri net formats.
  • Interactive Feedback: Users can provide feedback and see the model updated in real-time.

Figure 3: ProMoAI (https://promoai.streamlit.app/).

Benchmarking the Best: Which LLMs Perform Best?

We’ve benchmarked various state-of-the-art LLMs on their process modeling capabilities [4]. This involves testing quality (how well the generated models match the ground truth, using conformance checking with simulated event logs) and time performance (how long it took to generate the models).

Our extensive testing, using a diverse set of business processes, revealed significant performance variations across different LLMs. Some models consistently achieved higher quality scores, closely approaching the ideal, while others demonstrated faster processing times.

Figure 4:Benchmarking results: average quality score, total time, time per iteration, and number of iterations for different LLMs.

Key Findings:

  • A crucial finding was a positive correlation between efficient error handling and overall model quality. LLMs that required fewer attempts to generate a valid, error-free model tended to produce higher-quality results overall.
  • Despite the variability in individual runs, we observed consistent quality trends within similar groups of LLMs. This implies that while specific outputs might differ, the overall performance level of a particular LLM type tends to be relatively stable.
  • Some speed-optimized models maintained quality comparable to their base counterparts, while others showed a noticeable drop in quality. This highlights the trade-offs involved in optimizing for speed.

Can LLMs Improve Themselves? Self-Improvement Strategies

We’re exploring whether LLMs can improve their own performance through self-evaluation and optimization. Several strategies are being investigated:

  1. LLM Self-Evaluation: The LLM evaluates and selects the best model from a set of candidates it generates. We found the effectiveness of this strategy to be highly dependent on the specific LLM. Some models showed improvement, while others performed worse after self-evaluation.
  2. LLM Self-Optimization of Input: The LLM improves the natural language description before generating the model. We found this approach to be generally not effective and could even be counterproductive. Our findings suggest LLMs may lack the specific domain knowledge needed to reliably improve process descriptions.
  3. LLM Self-Optimization of Output: The LLM refines the generated model itself. This strategy showed the most promise, particularly for models that initially produced lower-quality outputs. While average improvements were sometimes modest, we observed significant gains in specific instances. However, there was also a risk of quality degradation, emphasizing the need for careful prompt design to avoid unintended changes (hallucinations).

Conclusion:

LLMs hold significant potential for transforming business process modeling, moving it from a traditionally manual and expert-driven task towards a more automated and accessible one. The framework we’ve developed, leveraging prompt engineering, a robust error-handling mechanism, and the sound intermediate representation of POWL, provides a viable pathway for translating natural language process descriptions into executable models in standard notations like BPMN. Our evaluation revealed not only variations in performance across different LLMs, but also consistent patterns. We found a notable correlation between efficient error handling and overall model quality and observed consistent performance trends within similar LLMs. We believe that the ability to translate natural language into accurate and useful process models, including executable BPMN diagrams, could revolutionize business operations.

References:

[1] Kourani, H., van Zelst, S.J. (2023). POWL: Partially Ordered Workflow Language. In: Di Francescomarino, C., Burattin, A., Janiesch, C., Sadiq, S. (eds) Business Process Management. BPM 2023. Lecture Notes in Computer Science, vol 14159. Springer, Cham. https://doi.org/10.1007/978-3-031-41620-0_6.

[2] Kourani, H., Berti, A., Schuster, D., van der Aalst, W.M.P. (2024). Process Modeling with Large Language Models. In: van der Aa, H., Bork, D., Schmidt, R., Sturm, A. (eds) Enterprise, Business-Process and Information Systems Modeling. BPMDS EMMSAD 2024. Lecture Notes in Business Information Processing, vol 511. Springer, Cham. https://doi.org/10.1007/978-3-031-61007-3_18.

[3] Kourani, H., Berti, A., Schuster, D., & van der Aalst, W. M. P. (2024). ProMoAI: Process Modeling with Generative AI. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2024/1014.

[4] Kourani, H., Berti, A., Schuster, D., & van der Aalst, W. M. P. (2024). Evaluating Large Language Models on Business Process Modeling: Framework, Benchmark, and Self-Improvement Analysis. arXiv preprint. https://doi.org/10.48550/arXiv.2412.00023.

 

Comments are closed.