INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS
Abstract
Process Discovery techniques, allowing to extract graph-like models from large process logs, are a valuable mean for grasping a summarized view of real business processes’ behaviors. If augmented with statistics on process performances (e.g., processing times), such models help study the evolution of process performances across different processing steps, and possibly detect bottlenecks and worst practices. However, when the process analyzed exhibits complex and heterogeneous behaviors, these techniques fail to yield good quality models, in terms of readability, accuracy and generality. In particular, the presence of deviant traces may lead to cumbersome models and misleading performance statistics. Current noise/outlier filtering solutions can alleviate this problem and help discover a better model for “normal” process executions, but they do not provide insight on the deviant ones. Then, difficult and expensive analyses are usually performed to extract interpretable and general enough patterns for deviant behaviors. The performance-oriented discovery approach proposed here is addressed to recognize and describe both a normal execution scenario and deviant ones for the process analyzed, by inducing different sub-models: (i) a collection of readable clustering rules (conjunctive patterns over trace attributes) defining the deviance scenarios; (ii) a performance model M0 for the “normal” traces that do not fall in any deviant scenario; and (iii) a performance model (and a “difference” model emphasizing the differences in behaviors from the “normal” execution scenario), for each discovered deviance scenario. Technically, these models are discovered by exploiting a conceptual clustering method, embedded in an iterative optimization scheme where the current version of M0 is replaced with the model extracted from the newly found normality cluster, in case the latter is more accurate than M0; on the other hand, the clustering procedure is devised to greedily find groups of traces that maximally deviate from M0. Tests on real-life logs confirmed the validity of this approach, and its capability to find good performance models, and to support the analysis of deviant process instances.