Mining Process Models from Workæow Logs. Abstract. Modern enterprises increasingly use the workæow paradigm - PDF

Mining Process Models from Workæow Logs Rakesh grawal 1 and imitrios Gunopulos 2 and Frank Leymann 3 1 IBM lmaden Research enter, 650 Harry Rd., San Jose, 95120, US, 2 IBM lmaden

Please download to get full document.

View again

of 15
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Pets & Animals

Publish on:

Views: 14 | Pages: 15

Extension: PDF | Download: 0

Mining Process Models from Workæow Logs Rakesh grawal 1 and imitrios Gunopulos 2 and Frank Leymann 3 1 IBM lmaden Research enter, 650 Harry Rd., San Jose, 95120, US, 2 IBM lmaden Research enter, 650 Harry Rd., San Jose, 95120, US, 3 IBM German Software evelopment Lab, Hanns-Klemm-Str 45, Bíoblingen, Germany, bstract. Modern enterprises increasingly use the workæow paradigm to prescribe how business processes should be performed. Processes are typically modeled as annotated activity graphs. We present an approach for a system that constructs process models from logs of past, unstructured executions of the given process. The graph so produced conforms to the dependencies and past executions present in the log. By providing models that capture the previous executions of the process, this technique allows easier introduction of a workæow system and evaluation and evolution of existing process models. We also present results from applying the algorithm to synthetic data sets as well as process logs obtained from an IBM Flowmark installation. 1 Introduction Organizations typically prescribe how business processes have to be performed, particularly when activities are complex and involve many people. business process speciæes the way in which the resources of an enterprise are used. The performance of an enterprise depends on the quality and the accuracy of the business process. Thus techniques to manage and support business processes are an active research area. ërw92ë ës93ë ëghs95ë ël92ë ëmgk95ë. In particular, a signiæcant amount of research has been done in the area of modeling and supporting the execution of business processes. The model generally used is the workæow model ëhol94ë. Workæow systems assume that a process can be divided in small, unitary actions, called perform the process, one must perform the set èor perhaps a subsetè of the activities that comprise it. In addition, there may be dependencies between diæerent activities. The main approach used in workæow systems is to model the process as a directed graph. The graph vertices represent individual activities and the edges represent dependencies between them. In other words, if activity has to be executed before activity B, an edge appears in the graph from to B. In practice, certain executions of the process may include a given activity and others may not. ach edge! B is, therefore, annotated with a Boolean function that determines whether the control æows from to B. urrent workæow systems assume that a model of the process is available and the main task of the system is to insure that all the activities are performed in the right order and the process terminates successfully ëgr97ë ël92ë. The user is required to provide the process model. onstructing the desired process model from an unstructured model of process execution is quite diæcult, expensive and in most cases require the use of an expert ëpp96ë ësch93ë. ontribution We present a new approach to address the problem of model construction. We describe an algorithm that, given a log of unstructured executions of a process, generates a graph model of the process. The resulting graph represents the control æow of the business process and satisæes the following desiderata: í ompleteness: The graph should preserve all the dependencies between activities that are present in the log. It should also permit all the executions of the process present in the log. í Irredundancy: The graph should not introduce spurious dependencies between activities. í Minimality: To clarify the presentation, the graph should have the minimal number of edges. The work we present has been done in the context of the IBM workæow product, Flowmark ël92ë. However, the process model we consider is quite general and the algorithms we propose are applicable to other workæow systems. The new capability we are proposing can be applied in several ways. technique that takes logs of existing process executions and ænds a model that captures the process can ease the introduction of a workæow management system. In an enterprise with an installed workæow system, it can help in the evaluation of the workæow system by comparing the synthesized process graphs with purported graphs. It can also allow the evolution of the current process model into future versions of the model by incorporating feedback from successful process executions. The following schema is being adopted in Flowmark for capturing the logs of existing processes in an enterprise that does not yet have an workæow system in place. First, all the activities in a process are identiæed. But since the control æow is not yet known, all possible activities are presented to the user for consideration through a graphical interface. The user selects the activities that, according to user's informal model of the business process, have to be executed next. Thus the successful executions of the process are recorded. Related research The speciæcation of dependencies between events has received much attention ëkle91ë ës + 96ë. Our dependency model is a simpliæcation of that proposed in ëkle91ë, and is consistent with the directed graph process model. In previous work in process discovery ëw95ë ëw96ë, the ænite state machine model has been used to represent the process. Our process model is different from the ænite state machine model. onsider a simple process graph: èfs; ; B; g, fs!,!, S! B, B! gè, in which two activities and B can proceed in parallel starting from an initiating activity S and followed by a terminating activity. This process graph can generate SB and SB as valid executions. The automaton that accepts these two strings is a quite different structure. In an automaton, the activities èinput tokensè are represented by the edges ètransitions between statesè, while in a process graph the edges only represent control conditions and vertices represent activities. n activity appears only once in a process graph as a vertex label, whereas the same token èactivityè may appear multiple times in an automaton. The problem considered in this paper generalizes the problem of mining sequential patterns ës95ë ëmtv95ë, but it is applicable in a more restricted setting. Sequential patterns allow only a total ordering of fully parallel subsets, whereas process graphs are richer structures: they can be used to model any partial ordering of the activities and admit cycles in the general setting. On the other hand, we assume that the activities form only one graph structure, whereas in the sequential patterns problem the goal is to discover all patterns that occur frequently. Organization of the paper The rest of the paper is organized as follows. In Section 2we describe the process model used in the paper. In Section 3 we present an algorithm to ænd a process graph, assuming that the graph is acyclic and that each activity appears exactly once in each execution. The algorithm ænds the minimal such graph in one pass over the log. In Section 4 we extend this algorithm to handle the case where some activities may not appear in each execution. In Section 5 we consider the case of general directed graphs admitting cycles. In these sections, we make the assumption that the log contains correct executions of the business process. However, this may not be the case in practice, and we outline a strategy to deal with this problem in Section 6. Section 7 presents implementation results using both synthetic datasets and logs from a Flowmark installation. We conclude with a summary in Section 8. 2 Process model Business processes consist of separate activities. n activity is an action that is a semantical unit at some level. In addition, each activity can be thought of as a function that modiæes the state of the process. Business processes are modeled as graphs with individual activities as nodes. The edges on the graph represent the potential æow of control from one activity to another 4. ach edge is associated with a Boolean function èon the state of the processè, which determines whether the edge will be followed or not. If a vertex èactivityè has more than one outgoing edge, the respective Boolean functions are independent from each other. eænition 1 èbusiness processè. business process P is deæned as a set of activities V P = V 1 ;:::;V n, a directed graph G P =èv P ; P è, an output function o P : V P!N k, and 8èu; vè 2 P a Boolean function f èu;vè : N k!f0; 1g. We will assume that G P has a single source and a single sink. These are the process' activating and terminating activities. If there are no such activities, 4 For the purposes of this paper, we will not diæerentiate between control æow and data æow, a distinction made in some systems ëgr97ë ël92ë. B Fig. 1. xample 1 one can add an activating node with edges to the ærst executed activities in the graph, and a terminating node with edges from the terminating activities of the process. The execution of the business process follows the activity graph: for each activity u that terminates, the output oèuè is computed. Then the functions on the outgoing edges are evaluated on the output. If f èu;vè èoèuèè is true, then we test if v can be executed. This test in general is a logical expression involving the activities that point to v in G. When v is ready, the outputs of incoming activities are passed as input to v, and it is inserted into a queue to be executed by the next available agent. xample 1. Figure 1 gives the graph G P of a process P. The process consists of æve activities V P = f; B; ; ; g. is the starting activity and is the terminating activity. The edges of the graph G P è P = fè; Bè, è; è, èb; è, è; è, è; è, è; èg represent the æow of execution, so that always follows, but B and can happen in parallel. Not shown in Figure 1 are o P and the Boolean conditions on the edges. ach activity has a set of output parameters that are passed along the edges, oèè;:::;oèè 2N 2. The output parameters are represented as a vector èoèèë1ë;oèèë2ëè. ach edge has a Boolean function on the parameters, such as: f è;è =èoèèë1ë é 0è ^ èoèèë2ë éoèèë1ëèè. For example an execution of this process will include activity if f è;è and f è;è are true. ach execution of a process is a list of events that record when each activity was started and when it terminated. We can therefore consider the log as a set of separate executions of an unknown underlying process graph. eænition2 èxecution logè. The log of one execution of a process èor simply executionè is a list of event records èp; ; ; T; Oè where P is the name of the process execution, is the name of the activity, 2fSTRT, Ng is the type of the event, T is the time the event occured, and O = oèè is the output of the activity if = N and a null vector otherwise. For notational simplicity, we will not write the process execution name and output in the event records. We assume that the activities are instantaneous and no two activities start at the same time. With this simpliæcation, we can represent an execution as a list of activities. This simpliæcation is justiæed because if there are two activities in the log that overlap in time, then they must be independent activities. s we will see, the main challenge in inducing a process graph from a log of past executions lies in identifying dependency relationship between activities. xample 2. Sample executions of the graph in Figure 1 are B, B,. If there exists a dependency between two activities in the real process, then these two activities will appear in the same order in each execution. However only the executions that are recorded in the log are known, and so we deæne a dependency between two activities with respect to the log. In the model graph, each dependency is represented either as a direct edge or as a path of edges from an activity to another. eænition 3 èfollowingè. Given a log of executions of the same process, activity B follows activity if either activity B starts after terminates in each execution they both appear, or there exists an activity such that follows and B follows. eænition 4 èependence between activitiesè. Given a log of executions of the same process, if activity B follows but does not follow B, then B depends on. If follows B and B follows, or does not follow B and B does not follow, then and B are independent. xample 3. onsider the following log of executions of some process: fb,, Bg. The activity B follows èbecause B starts after in the two executions both of them appearè but does not follow B, therefore B depends on. On the other hand, B follows èbecause it is recorded after in the only execution that both are presentè and follows B èbecause it follows, which follows Bè, therefore B and are independent. Let us add to the above log. Now, B and are no longer independent; rather, B depends on. It is because B follows as before, but and are now independent, so we do not have following B via. Given a log of executions, we can deæne the concept of a dependency graph, that is, a graph that represents all the dependencies found in the log. eænition5 èependency graphè. Given a set of activities V and a log of executions L of the same process, a directed graph G VL is a dependency graph if there exists a path from activity u to activity v in G VL if and only if v depends on u. In general, for a given log, the dependency graph is not unique. In particular, two graphs with the same transitive closure represent the same dependencies. very execution of the process recorded in the log may not include all the activities of the process graph. This can happen when not all edges outgoing from an activity are taken èe.g. the execution in Figure 2è. n execution R induces a subgraph G 0 of the process graph G =èv; è in a natural way: G 0 =èv 0 ; 0 è, where V 0 = fv 2 V j v appears in Rg and 0 = fèv; uè 2 j v terminates before u starts in Rg. B B Fig. 2. xample 5 eænition6 èonsistency of an executionè. Given a process model graph G =èv; è of a process P and an execution R, R is consistent with G if the activities in R are a subset V 0 of the activities in G, and the induced subgraph G 0 =èv 0 ; fèu; vè 2 j u; v 2 V 0 gè is connected, the ærst and last activities in R are process' initiating and terminating activities respectively, all nodes in V 0 can be reached from the initiating activity, and no dependency in the graph is violated by the ordering of the activities in R. This deænition of consistency is equivalent to the following one: R can be a successful execution of P for suitably chosen activity outputs and Boolean edge functions. xample 4. The execution B is consistent with the graph in Figure 1, but B is not. Given a log of executions, we want to ænd a process model graph that preserves all the dependencies present in the log. t the same time, we do not want the graph to introduce spurious dependencies. The graph must also be consistent with all executions in the log. graph that satisæes these conditions is called a conformal graph. eænition7 èonformal graphè. process model graph G is conformal with a log L of executions if all of the following hold: í ependency completeness: For each dependency in L, there exists a path in G. í Irredundancy of dependencies: There is no path in G between independent activities in L. í xecution completeness: G is consistent with every execution in L. xample 5. onsider the log f, Bg. Both the graphs in Figure 2 are dependency graphs. The ærst graph is conformal, but the second is not because it does not allow the execution. Problem statement. We deæne the following two problems: Problem 1: Graph mining. Given a log of executions of the same process, ænd a conformal process graph. Problem 2: onditions mining. Given a log of executions of the same process and a corresponding conformal process graph G = èv; è, ænd the Boolean functions f èu;vè ; èu; vè 2. In this paper we will only consider Problem 1. Problem 2 is the subject of ongoing research; some preliminary ideas can be found in ëgl97ë. ssume throughout that the process graph has jv j = n vertices, and the log contains m separate executions of the process. Generally, m n. In Sections 3 and 4, we will assume that the process graph is acyclic. This assumption is reasonable in many cases and, in fact, it is also frequently the case in practice ël92ë. We will relax this assumption in Section 5 and allow for cycles in the process graph. 3 Finding directed acyclic graphs We ærst consider the special case of ænding model graphs for acyclic processes whose executions contain exactly one instance of every activity. For this special case, we can obtain a faster algorithm and prove the following minimality result: Given a log of executions of the same process, such that each activity appears exactly once in each execution, there exists a unique process model graph that is conformal and minimizes the number of edges. Lemma 1. Given a log of executions of the same process, such that each activity appears in each execution exactly once, if B depends on then B starts after terminates in every execution in the log. Proof. ssume that this is not the case. Then there exists an execution such that B starts before terminates. From the deænition of dependency, there must be a path of followings from to B. But since all activities are present in each execution, there must be at least one following which does not hold for the execution where B starts before, a contradiction. 2 Lemma 2. Let G and G 0 begraphs with the same transitive closure. Then both graphs are consistent with the same set of executions if each activity appears exactly once in each execution. Proof. Since every activity appears in each execution, the induced subgraph for any execution is the original graph. The two graphs have the same transitive closure, so if there is a path between two activities in one, then there is a path between the same activities in the other. It follows that if a dependency is violated in one graph, then it must be violated in the other one. 2 Lemma 3. Given a log of executions of the same process, where all activities appear in each execution once, and a dependency graph G for this log, G is conformal. B B Fig. 3. xample 6 Proof. By deænition, the dependency graph preserves all the dependencies present in the log, and none other. For a given execution in the log, the induced subgraph is again the graph G because all activities are present. Further, no dependency is violated because if one was, it would not be in the dependency graph. It follows that G is conformal. 2 We can now give an algorithm that ænds the minimal conformal graph. lgorithm 1 èspecial Gè. Given a log L of m executions of a process, ænd the minimal conformal graph G, assuming there are no cycles in the graph and each activity appears in each execution of the process. 1. Start with the graph G =èv; è, with V being the set of activities of the process and = ;. èv is instantiated as the log is scanned in the next step.è 2. For each process execution in L, and for each pair of activities u; v such that u terminates before v starts, add the edge èu; vè to. 3. Remove from the edges that appear in both directions. 4. ompute the transitive reduction 5 of G. 5. Return èv; è. Theorem 1. Given a log of m executions of a given process having n activities, lgorithm 1 computes the minimal conformal graph in Oèn 2 mè time. Proof. First we show that after step 3, G is a dependency graph. From Lemma 1we know that the graph after step 2 at least contains an edge corresponding to every dependency. Since the edges we remove in step 3 form cycles of length 2, where there are activities u and v such that u follows v and v follows u, such edges cannot be dependencies. fter step 4, G is the minim
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks