A Machine Learning Approach to Identifying Process Instance and Activity

(1)

Publisher’s version / Version de l'éditeur:

Proceedings of the IADIS International Conference Applied Computing 2009,

2009-11-21

READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING THIS WEBSITE.

https://nrc-publications.canada.ca/eng/copyright

Vous avez des questions? Nous pouvons vous aider. Pour communiquer directement avec un auteur, consultez la

première page de la revue dans laquelle son article a été publié afin de trouver ses coordonnées. Si vous n’arrivez pas à les repérer, communiquez avec nous à PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca.

Questions? Contact the NRC Publications Archive team at

PublicationsArchive-ArchivesPublications@nrc-cnrc.gc.ca. If you wish to email the authors directly, please see the first page of the publication for their contact information.

NRC Publications Archive

Archives des publications du CNRC

This publication could be one of several versions: author’s original, accepted manuscript or the publisher’s version. / La version de cette publication peut être l’une des suivantes : la version prépublication de l’auteur, la version acceptée du manuscrit ou la version de l’éditeur.

Access and use of this website and the material on it are subject to the Terms and Conditions set forth at

A Machine Learning Approach to Identifying Process Instance and

Activity

Liu, Hongyu; Wang, Yunli; Geng, Liqiang; Keays, Matthew; Maillet, Nicholas

https://publications-cnrc.canada.ca/fra/droits

L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.

NRC Publications Record / Notice d'Archives des publications de CNRC:

https://nrc-publications.canada.ca/eng/view/object/?id=de3ce930-9f8c-4063-b72d-52fa1b182277 https://publications-cnrc.canada.ca/fra/voir/objet/?id=de3ce930-9f8c-4063-b72d-52fa1b182277

(2)

A MACHINE LEARNING APPROACH TO IDENTIFYING

PROCESS INSTANCE AND ACTIVITY

Hongyu Liu, Yunli Wang, Liqiang Geng, Matthew Keays, Nicholas Maillet

Institute for Information Technology, National Research Council Canada 46 Dineen Drive Fredericton NB, Canada E3B 9W4

ABSTRACT

Current process mining techniques assume a structured event log to be in place. The log typically contains explicit information about events referring to an instance and an activity. However, it is not always the case. For instance, many structured work or business processes are managed via Email-based communication inside and between organizations. In this paper, we address the problem of automatically identifying process instances and activities from unstructured event log such as Emails using machine learning algorithms. The work is focused on two distinct subproblems: 1) Instance identification is to automatically partition logs into clusters, each of which is a distinct instance. Our approach is to examine a short time window of the streaming event log to find the clusters, without examining the whole event log. 2) Identifying the activities in each instance. Our approach is to model activity identification problem as a sequence labelling task based on Conditional Random Fields (CRFs). We have conducted the experiments on synthetic logs with different noise types to study the impact on the effectiveness of our approach.

KEYWORDS

Process mining, workflow, activity management, machine learning, clustering, classification, conditional random fields.

1. INTRODUCTION

Process mining [4] is to extract knowledge from event logs and is used as a tool to find out how people and procedures really work, such as process model discovery, conformance checking, and performance analysis. Most of the research assumes that event logs are automatically collected by modern workflow management systems and are supposed to contain detailed process-related information, that is, they sequentially record all the activities being executed with explicit activity names (i.e., a well-defined step in the process model) as well as which instances they belong to. For example, consider the following car assembly event log: {(car1, InstallEngine, 9:00am), (car2, AddSeat, 9:30am), (car1, InstallDoors 10:30am,)

(car2, InstallDoors, 11:00am)}. It contains two process instances (car1 and car2), each of which consists of

a series of process activities that were performed (InstallEngine, InstallDoors, AddSeat are the activity labels). Event logs also contain additional information about the events such as the originator and the timestamp of an activity, or other data elements. However, such restrictive assumption made about the richness of information available in event logs does not always hold in many scenarios. Many structured work or business processes are managed via Email-based communication inside and between organizations. For instance, an employee applying for travel may send and receive a series of Email messages: send the request, receive approval from the supervisor, and inform the secretary to arrange the trip or claim travel expense. A consumer purchasing online may receive messages in the context of e-commerce transactions. Since there is no process-related information (explicit process instance and activity labels) directly attached to such message logs, identifying instances and activities is the first step towards effective workflow management when applying process mining techniques to such unstructured event logs.

This is not easy task because activities belonging to different instances overlap chronologically. Two messages with high content similarity corresponding to the same process activity may belong to different

(3)

instances, while two messages with low similarity are associated with different activities, but belong to the same instance. Furthermore, for some tasks of workflow management, such as conformance checking or staff assignment, may require to make a decision immediately in real time. Consider the following question of conformance checking with streaming email data: given a process model, when a new message arrives, does it conform to the expected behaviour? It requires a quick and correct decision on predicting the corresponding instance and activity with which an observed behaviour is associated, which is critical to successful detection of abnormal process executions for some real-time alert systems.

Little work has been done in the area of automatically identifying process instances and activities from unstructured event logs [1, 2, 3, 8, 9]. The issue has been addressed in some works, but only to a very limited extent, by directly or indirectly making use of some unique identifiers that appeared in the messages. In [2], when dealing with Web service event logs, SOAP messages were grouped by taking advantage of chained correlation information that a message has in common with another messages belonging to the same service instance, which can be easily obtained in the context of contemporary web service standards. When identifying activities corresponds to e-commerce transactions as shown in [1], a unique alphanumeric purchase code contained in most messages was relied on to group the messages into transaction cases. [3] assumes that explicit references (company names as instance name, and agreements as activity labels) are already tagged in the messages’ Subject lines, which is not general enough to apply to general E-mail logs.

This paper addresses the problem of identifying process instances and activities from unstructured event logs using machine learning algorithms. The first task is to automatically partition a set of raw messages to clusters, each of which is a distinct instance of a process. We propose to use a supervised sequential clustering approach to solve the problem. Within each cluster, activity identification is treated as a sequence labelling problem, where a label is a defined activity step (i.e. it describes what has happened) in a process model, and actual message is observation. Our approach differs from other works in two ways. First of all, we apply machine learning algorithms to identify instances and activities rather than some special identifiers appeared in the messages, making it applicable to work with arbitrary unstructured event logs. Second, we take both the streaming nature of the data and the context of email sequence into consideration when classifying a new message, rather than independent message content only.

The remainder of this paper is organized as follows. In Section 2, we give an overview of the travel planning process model and corpus we used for our research and evaluation. Section 3 describes the overall approach. Section 4 describes the algorithms and empirical results for instance identification. Then, in Section 5, we discuss the way to apply Conditional Random Fields (CRFs) [7] for activity prediction. Conclusion and future work are in Section 6.

2. PROCESS MODEL AND DATA SETS

1 2

3 4 5

1. Employee applies for travel. 2. Group leader denies the application. 3. Group leader approves the application. 4. Travel authority books flight. 5. Travel authority informs employee on the itinerary. 6. Employee approves the itinerary. 7. Employee denies the itinerary. 8. Secretary books hotel. 9. Secretary rents car. 10. Secretary informs employees the travel arrangement. 11. Employee approves the travel arrangement. 12. Employee denies the travel arrangement. 13. Employee claims travel expense

6 7 8 8 9 9 8 10 11 12 13 Fig.1. Travel Planning Process Model

A process model consists of a set of activities and transactions between them. As a detailed case study, we have used the travel planning process model within an organization as shown in Fig. 1. It consists of 13 defined process activity steps. To simplify, we will use activity labels Apply1, Deny1, Approve1,

(4)

BookFlight1, InformEmployee1, Approve2, Deny2, BookHotel1, RentCar1, InformEmployee2, Approve3, Deny3, ExpenseClaim to describe the process activities from 1 to 13, respectively.

All of these activities are performed by sending E-mail messages which contain all the information associated to them. Each message contains Email structure features including Originator, Recipients, Subject, and Timestamp, and Email content features including Traveler’s Names, Keywords, Travel Dates, and

Destinations. The simulated data sets were generated in two steps. First, structured event logs are generated

from the process model (Fig.1). Then, all Email features associated with each message were generated. For performance evaluation purposes only, the true instance label and activity label are also included in each message to create a “gold standard” against which our algorithms are evaluated. As a result, all the information about an activity within a message forms an audit trail entry of the event log.

For the experiments, we applied two kinds of noise: attribute noise and instance noise.

Attribute noise is applied to the attributes (Traveler’s Names, Keywords, Travel Dates, and Destinations) by specifying what percentage chance of each word in each attribute being changed. Words that had attribute noise applied would either be removed or replaced by a random word generated from a large list of words. For instance noise, we used 5 different noise types adapted from [5]. Let x be an instance sequence, x={x1,x2,….,xn}, where xi is an activity in the form of email message.

a. Addition noise: randomly inserts a randomly chosen activity at a random position to x. b. Deletion noise: randomly removes one activity from the sequence x.

c. Replacement noise: randomly replaces one activity in the sequence x, with a randomly chosen activity.

d. Swap noise: Swaps two adjacent activities in the sequence x.

e. Random noise: randomly performs all the four above noise types to the sequence x (sum of probabilities =1)

Instance noise level specifies the percentage of instances that are randomly chosen to which one of the 5 above described noise types is applied.

The training data consists of 500 process instances containing 12879 audit trail entries. Each of the three testing data sets consists of 200 process instances, containing about 5200 audit trail entries which are in random order. For every noise type, we used five levels of noise: 0%, 10%, 20%, 30% and 40%, therefore, every testing data set has 5*5*5=125 noisy logs.

3. OVERVIEW OF APPROACH

In this paper, we focus on the problem of identifying process instances and activities from unstructured event logs such as Emails. The challenges to address this problem are that (1) there are no special identifiers appearing in the messages which can be used directly to distinguish different instances and activities, and (2) the decision on the instance identification and activity prediction has to be made immediately, based on information gleaned only from previous messages in the steam. Our methodology consists of two distinct steps:

1. Instance Identification: Given a new seen audit trial entry in the stream of a process event log, the task is to identify which process instance it belongs to. For example, message 1 and message 3 are associated to Mary’s trip to attend AI2009 conference (Instance #1), while message 2 is identified as part of John Smith’s trip to DM2010 meeting (Instance #2). That is, to automatically partition logs into clusters, each of which is a distinct process instance going through a series of messages. 2. Activity Identification: For each instance (cluster), the next step is to identify the process

activities with which the messages are associated. For example, message 1 is BookFlight1, message 3 is Approve1 (both are in Instance #1), and message 3 is ExpenseClaim (in Instance #2).

4. INSTANCE IDENTIFICATION

The first task towards process mining on unstructured event log is to group activities belonging to the same instance into clusters, each of which represents a distinct instance of the process. This is not easy task because messages belonging to different instances overlap chronologically, and there are no

(5)

automatically-detected unique identifiers available that are common to all messages in a single instance but do not appear in other instances.

A natural approach is to examine all the instances in the event log and cluster groups of similar instances. However, traditional methods need to compare all pairs of messages for a one time clustering, which is not efficient for dynamic streaming data. We have observed that the messages in each instance are numerable and spanned within a short time window in the log stream, and then changed to another instance. Inspired by the work for Email spam detection dealing with streaming data [6], we propose a supervised sequential clustering approach for instance identification task. As a result, examining a short time window of the event log is expected to find the clusters rapidly and efficiently, without examining the whole event log.

4.1 Approach

Our approach is to use the sliding window method to train a SVM pairwise classifier. We define the time window size n as the pre-defined number of messages in each window, and the window slide s as the number of messages used to shift the time window.

For training, each pair of message in the time window X={x1,x2, …,xn} forms a training example, with

label=1 if they belong to the same instance, and label=0 otherwise. Feature values are the Cosine similarity values of feature functions fi(xj,xk) defined as follows:

f1(xj,xk) = similarity of <originator, recipients> between message xj and xk: compare the set of people

participating in an instance with the set of people to which the emails xj and xk are addressed.

f2(xj,xk) = similarity of Subject of message xj and xk

f3(xj,xk) = similarity of Travel Dates appeared in message xj and xk

f4(xj,xk) = similarity of Keywords included in message xj and xk

f5(xj,xk) = similarity of Destinations appeared in message xj and xk

f6(xj,xk) = similarity of Traveler’s Names appeared in message xj and xk

Each message is compared against all other messages in the window. We used a sliding window method to reduce the total number of message comparisons. If the window slide s=1, the training data is on every combination of two messages in the log. All the training examples and associated class labels are used to train a SVM classifier.

Suppose the current time window X={x1,x2,…,xn}, n is the size of the window. When a new message

e=xn+1 is seen, it is processed by either assigning it to one of existing clusters or creating a new cluster based

on the classification results using the learned SVM classifier. Let the classification results be Y={y1,n+1, y2, n+1,…, yn,n+1}, where yk,n+1 represents the classification result between message xn+1 and xk (k=1..n): yk,n+1 =1

if messages xk and xn+1 are classified in the same instance, 0 otherwise, then put xn+1 in whichever cluster has

the highest vote, or create a new one if they are all 0. Detailed algorithm is described as follows:

Given new message e=xn+1,

C←{} for t=1..n do cj← C c∈

max

arg

_∑∑

∈ =c + x n k n k k

y

1 1 , if

∑∑

∈ j = + k c x n k n k

y

1 1 , <=0 then C←C

Υ

{

x

n+1} else C←C\{cj}

Υ

{

c

_j

Υ

{

x

_n₊₁}} endif endfor return C

(6)

4.2 Evaluation

We want to assign two messages to the same cluster if and only if they belong to the same instance. We compare the predicted instance label by our algorithm against the true instance label contained in each message (described in Section 2). A true positive (TP) decision assigns two messages belonging to the same instance (called the same instance messages) to the same cluster. A false positive (FP) decision assigns two different instance messages to the same cluster. A false negative (FN) decision assigns two same instance messages to different clusters. F1 measure is used for evaluation in our experiments. It is calculated by F1 = 2PR/(P+R), where P = TP/(TP+FP), R = TP/(TP+FN). Instance Identification 0 0.2 0.4 0.6 0.8 1 1.2 10 20 30 40 50 60 70 80 90 100 Time Window Size

F 1 M e a su re I=0.3 k=0.0 I=0.3 k=0.1 I=0.3 k=0.2 I=0.3 k=0.3 I=0.3 k=0.4

Fig.2. Results of F1 measures for the logs with attribute noise I=0.3 and 5 different attribute noise (K=0.0-0.4) using 10 different time window sizes.

Instance Identification 0 0.2 0.4 0.6 0.8 1 1.2 k=0.0 k=0.1 k=0.2 k=0.3 k=0.4 Attribute Noise F 1 M e a su re I=0.0 I=0.1 I=0.2 I=0.3 I=0.4

Fig.3. Results of F1 measures for the logs with all the combination of 5 attribute noise (I=0.0-0.4) and 5 different attribute noise (K=0.0-0.4)

For training, a window size of 100 was used. We have conducted 250 experiments for testing, i.e., all possible combination of 5 instance noise levels (K=0.0-0.4), 5 attribute noise levels (I=0.0-0.4) and 10 time window sizes (10, 20,…,100). Fig.2 shows some of the results. We only show the results for I=0.3 because the obtained results for the other noise levels lead to the same conclusions that can be drawn.

We first have a look at the effect of the time window size on the performance. There is a significant performance improvement by changing the window size from 10 to 30, and there is not much difference in performance with a window size larger than 40, indicating that 40 is the best window size to choose. When examining the mis-clustered messages, we found that all the errors are false negative errors and the messages in each instance are just too spread out to be clustered correctly due to using a smaller window size. The

(7)

reason is that with a small window size, a message, which is not the starting activity of a process, may be assigned to a new cluster since it does not belong to any clusters appearing in the current time window. As a result, one instance may be split into several clusters, causing high false negative errors. The problem is solved by increasing the window size until reaching a certain demarcation point, beyond which the performance is no longer improved. Therefore, the results validate our hypothesis that examining only a short time window is effective and rapid to find the clusters compared to examining the whole event log.

To test how the quality of the algorithm degrades with noise, it is necessary to vary the noise level in the data. Here, a noise level of 0.0 means that the data has no noise. A instance noise level of 0.1 means 10% of instances are applied to 5 types of noise, a attribute noise level of 0.2 means 20% of the words have noise applied, either remove the word from the attribute, or the word is replaced by a word randomly selected from a word list. Fig.3 shows the effects of instance noise I and attribute noise K on the performance. As expected, instance noise I makes no big difference on the performance, since the content of a message rather than the context is used to identify the instances. The performance decreases gradually with increasing attribute noise K. When the attribute noise level k<=0.2, the algorithm gives high accuracy (i.e., F1>0.87). Even with higher noise levels, the performance is still good.

5. ACTIVITY IDENTIFICATION

For each cluster which is an instance represented as a sequence of messages, the next step is to identify the activities with which the messages are associated. The contexts along the activity sequence including not only message content, but also sequential relations, reveal the structure of underlying activity transitions, therefore, the procedure of identifying the activities can be treated as a kind of sequence labelling problem, where a label is a defined activity step in a process model. We model the execution of a process as finite-state automata over an underlying chain of hidden states, where activities are hidden states and actual messages are observations. The goal is to label a message based on the context of email so that the likelihood of the label sequence given the whole sequence is maximized.

5.1 Approach

We apply Conditional Random Field (CRF) [7] in this paper, which is a state-of-art sequence labelling method. Given an observation sequence of messages X=(x1, x2, x3, ..., xn) and the corresponding activity

label (state) sequence Y=(y1, y2, y3, ..., yn) , n is the length of the sequence, yi

∈

{Apply1, Deny1, Approve1, BookFlight1, InformEmployee1, Approve2, Deny2, BookHotel1, RentCar1, InformEmployee2, Approve3, Deny3, ExpenseClaim}, the probability of Y conditioned on X, P(Y|X) is defined as:

∑∑

= = −

=

n i k j i i j j

f

y

X

Z

X

Y

P

1 1 1

,

)

(

exp(

)

(

1 )

|

(

λ

(1) Where λ={λ1, ..., λk} are the weights learned for the feature functions fj(yj-1, yj, X), j=1..k, which describe

the relationships between yj-1, yj, and X. We defined the following features:

Position feature specifies the position of xi in the sequence. It is set to 1 if xi is the start of the sequence, 3

if xi is the end of the sequence, and 2 otherwise. Therefore this feature function gives a good insight into the

start and end of the sequences.

Keyword feature describes important words contained in message xi, such as “book, flight, time,

departure” in BookFlight1, “claim, total, net, cost” in ExpenseClaim.

Context features define the relation between words in an activity entry in the sequence and its neighbors. In our work, we set the window size to three for this feature. That is, the attribute similarities sim(ai, ar) between current entry xi to the next three and previous three neighbor entries are calculated, where ai and ar are the attribute vectors a of xi and xr respectively, i-3<= r <=i+3. We used this feature function with six sets

of attributes: Traveler’s Names, Travel Dates, Destinations, Keywords, Originator, and Recipients. The

Originator and Recipients attributes were used together, because of their relation, in two separate methods.

First, we considered the similarity of the Originator and the Recipients of one entry to another, dependent on which attribute was the originator, and which attribute was the recipient. We then considered the similarity of the Originator and the Recipients of one entry to another, independent of the attribute type.

(8)

Given the conditional probability of the state sequence defined by CRF in (1) and the parameter λ , the task of the inference is to estimate the probability that a message x being a given activity label y. The most probable label sequence for input sequence X is Y*=argmax P(Y|X).

Because Z(x) does not depend on Y, the most likely Y can be found with the Viterbi algorithm. The forward value αi(y) is defined as the best score (the highest probability) being label y ending at position i

given the observation sequence made so far, and it can be calculated recursively:

∑

=

− j i i j j y

y

f

y

X

y

)

max

(

'

)

exp

(

'

,

)

(

_i_-₁ ₁ ' i

α

λ

α

Activity Identification 0.975 0.98 0.985 0.99 0.995 1 I=0. 0 I=0. 1_sw ap I=0. 2_sw ap I=0. 3_sw ap I=0. 4_sw ap I=0. 1_re plac e I=0. 2_re plac e I=0. 3_re plac e I=0. 4_re plac e I=0. 1_de letio n I=0. 2_de letio n I=0. 3_de letio n I=0. 4_de letio n I=0. 1_ad diti on I=0. 2_ad diti on I=0. 3_ad diti on I=0. 4_ad diti on I=0. 1_ra ndom I=0. 2_ra ndom I=0. 3_ra ndom I=0. 4_ra ndom Instance Noise A cc u ra cy K=0.0

Fig.4. Activity identification accuracy for the logs with different types of instance noise combined with 5 different noise level (I=0.0-0.4).

Activity Identification 0 0.2 0.4 0.6 0.8 1

I=0.0 swap replace deletion addition random Instance Noise A cc u ra ry K=0.0 K=0.1 K=0.2 K=0.3 K=0.4

Fig.5. Average Activity identification accuracy for the logs with different types of instance noise (I=0.0, swap, replace deletion, addition and random) combined with 5 different attribute noise (K=0.0-0.4).

5.2 Evaluation

To evaluate the performance we compare the predicted labels to the actual activity labels of the messages, that is the number of correctly assigned labels divided by the total number of labelled activities. Fig.4 indicates how different types of instance noise I affect the performance. In the graph, the x-axis shows the

(9)

noise-free set (I=0.0), the noise type and the percentage of noise in the log. For example, I=0.2_swap indicates noisy log with 20% swap noise.

The results show that all the logs give nearly perfect accuracy (>0.985), indicating that the learning algorithm is robust to different types and amounts of instance noise, and the accuracy does not vary dramatically as the amount of instance noise increases. For each type of instance noise, the performance slightly decreases when the noise level increases from 0.1 to 0.4. Furthermore, compared with Deletion and

Addition noise types, Swap and Replace noise types showed more sensitivity to the noise.

We also investigated the effect of attribute noise K on the performance. Fig.5 shows the results of three test data sets with combinations of different noises. We obtained a large number of results. For simplicity and as a summary, for each type of instance noise, we calculated the average accuracy values over all the noise levels. As expected, the performance degrades as the amount of noise increases, since the prediction is based on not only message content, but also the context of sequence. By looking at the log with attribute noise only (I=0.0), one can see that the algorithm still gives good performance with K<=0.3. This is because our algorithm uses multiple feature representation, when some of them are absent, others still take effect.

6. CONCLUSIONS AND FUTURE WORK

In this paper, we have addressed the problem of automatically identifying process instances and activities from unstructured event log such as E-mails. Our approach uses supervised machine learning algorithms to solve the problem in two steps. A sequential clustering algorithm is applied to identify instances, where only messages within a time window rather than all messages in the event log are needed. Activity identification is treated as a sequential labeling problem, where we focus on predicting the most likely labels by taking advantage not only of text content, but also of linkage relations. We have conducted the experiments with different noise types to demonstrate the effectiveness of our approach. Although we focus on E-mail event logs, the concepts and algorithms are applicable to general unstructured event logs.

Our future plans include considering other features such as social network analysis for learning and further experiments on real data sets, and extending the algorithms to workflow management applications.

REFERENCES

[1] N. Kushmerick and T. Lau. Automated Email Activity Management: An Unsupervised Learning Approach. In IUI ’05: Proc. of the 10th intl. conf. on Intelligent User Interfaces, pages 67–74. ACM Press, 2005.

[2] W. M. P van der Aalst, M. Dumas, C. Ouyang, A. Rozinat, and H.M.V. Verbeek: Conformance Checking of Service Behaviour. ACM Transactions on Internet Technology forthcoming, 2007. [3] W. M. P. van der Aalst, A. Nikolov. Mining E-Mail Messages: Uncovering Interaction Patterns and

Processes Using E-Mail Logs. International Journal of Intelligent Information Technologies, Volume 4, Issue 3. 2008

[4] W. M. P. van der Aalst, B.F. van Dongen, J. Herbst, L. Maruster, G. Schimm, and A.J.M.M. Weijters. Workflow Mining: A Survey of Issues and Approaches. Data and Knowledge Engineering, 47(2):237–267, 2003.

[5] A. K. A de Medeiros, A. J. M. M. Weijters, and W. M. P. van der Aalst: Genetic Processing Mining: An Experimental Evaluation. Data mining & Knowledge Discovery. 14:245-304, 2007.

[6] P. Haider, U. Brefeld, and T. Scheffer. Supervised Clustering of Streaming Data for Email Batch Detection. In Proc.of the 24th International Conference on Machine Learning, Corvallis, OR, 2007.

[7] J. Lafferty, A. McCallum, and F. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labelling Sequence Data. In Proc. 18th International conference on Machine learning, Morgan Kaufmann, San Francisco, CA, 2001.

[8] M. Song, C. W. Günther and W. M. P. van der Aalst. Trace Clustering in Process Mining. BPM 2008

International Workshops, Milano, Italy, 2008.

[9] W. M. P. van der Aalst. Exploring the CSCW Spectrum Using Process Mining. Advanced Engineering