Discussion - tel-00431117, version 1

|Sα| ψmin, ωmin = 0.2 ψmin, ωmin = 0.4 ψmin, ωmin = 0.6 ψmin, ωmin= 0.8

1 22.1 s 22.1 s 20.4 s 19.2 s

2 93.1 s 90.2 s 93.8 s 90.7 s

4 577.8 s 563.3 s 581.8 s 569.7 s

8 2024.2 s 1998.8 s 1994.3 s 1955.2 s

Table 6.4: Total execution time of each test by using soft beliefs.

satisfaction ofτ in the rest of an input sequence (i.e., s\sα where sα |=Sα) becomes lower, and the step at line 8 in Algorithm 14 avoids matching all combinations of subsequences.

6.7 Discussion

In this chapter, we studied the generalizations of unexpected sequence discovery with respect to concept hierarchies of the taxonomy of data. We formalized the notions of generalized sequences and generalized sequence rules, and then we proposed two new types of unexpected sequences:

generalized unexpected sequences and soft unexpected sequences.

Generalized unexpected sequences are determined against generalized beliefs, which consist of generalized sequence rules and semantic contradictions between generalized sequences. The construction of generalized belief system is the same procedure with the construction of a belief system addressed in the originalMuse framework presented in Chapter 4. In fact, the extraction of generalized unexpected sequences follows the same manner of Muse, except to consider the matching between a sequence and a generalized sequence with respect to a concept hierarchy.

Therefore, we did not further list the related algorithms for discovering generalized unexpected sequences. In contrast, the algorithms and experiments of soft unexpected sequence discovery are carefully studied.

Soft unexpected sequences are determined in hierarchical with respect to generalized sequence rules, instead of explicitly constructing a belief system. The most advantages of the approach SoftMuse to discover soft unexpected sequence include:

1. Generalization of data is addressed by using generalized sequence rules;

2. Semantic contradictions are no longer required, where the determination of semantic con-tradiction is replaced by computing semantic relatedness/concon-tradiction degrees;

3. The tau-fuzzy unexpectedness is integrated into SoftMuse, and the semantic related-ness/contradiction degree can also be described by using fuzzy sets.

tel-00431117, version 1 - 10 Nov 2009

98 CHAPTER 6. GENERALIZATIONS IN UNEXPECTED SEQUENCE DISCOVERY Notice that in soft unexpected sequence discovery, when we consider the semantics, we can determine the semantic contradiction between two single items, for example, between “login” and

“logout”. However, to define the semantic contradiction for operational conjunction of items with temporal order is hard, which is still an open problem in semantics related data mining tasks.

In the frameworkMuse, the belief system consists of sequence rules and semantic contradiction relations, so that the unexpected sequences can be strictly determined within the supervised discovery process. However, because the auto-determination of semantic contradiction within SoftMuseis unsupervised, the validation of discovered unexpected sequences is required. Hence, in the next chapter, we will take account of the evaluation of the discovered unexpected sequences in the self-validation schema in terms of the notions of unexpected sequential patterns. We will also present the notions of unexpected implication rules for investigating the structural associations and predictions of unexpectedness in sequence databases.

tel-00431117, version 1 - 10 Nov 2009

Chapter 7 Unexpected Sequential Patterns and Implication Rules

In previous chapter, we developed and extended the framework Muse for discovering various unexpected sequences with fuzzy methods and generalizations in data taxonomy. The followed important task is therefore to evaluate the quality of the discovered unexpected sequences, and then to acquire useful information from such sequences for studying the structure in order to predict the unexpectedness. In this chapter, we propose the notions of unexpected sequential patterns and unexpected implication rules for this purpose.

A part of the work presented in this chapter has been published in the journal La Revue des Nouvelles Technologies de l’Information (RNTI) and in the International Journal of Business Intelligence and Data Mining (IJBIDM).

7.1 Introduction

We have discussed the problem relied on unexpected sequence discovery that the number and quality of discovered sequences strongly depend on the belief system, where the correctness of beliefs is ensured by the interpretation of domain expertise knowledge.

On the other hand, the discovered unexpected sequences may contain low frequency noisy data in the database, which cannot be avoided in the discovery process if they violate some beliefs.

Example 31 Let us consider again the example discussed at the end of Chapter 4.

S =











s₁ =· · · ·(a)(b)· · · ·(c)· · · · s₂ =· · · ·(a)(b)· · · ·(c)· · · · s3 =· · ·(a)(b)· · · ·(c)· · · · s4 =· · · ·(a)(b)· · · ·(c)· · · s5 = (a)· · · ·(b)









 .

tel-00431117, version 1 - 10 Nov 2009

100 CHAPTER 7. UNEXPECTED SEQUENTIAL PATTERNS AND IMPLICATION RULES Given a belief b consisting of a sequence implication rule h(a)(b)i →^∗ h(c)(d)i, the sequences in the sequence set S are α-unexpected because for each sequence s ∈ S we have that h(a)(b)i ⊑ s and h(a)(b)(c)(d)i 6⊑s. However, the sequence s₅ has a completely different structure than other sequences, which can be considered as noisy data that should not be covered by the belief b.

Obviously, the approaches proposed in previous chapters cannot filter a sequence like s₅ from the result unexpected sequence set. Moreover, after examining the frequent common structure of the rest unexpected sequences, we can find that a rule h(a)(b)(c)i →^∗ h(d)i can better state the

unexpectedness.

Therefore, in this chapter, we study the validation of the discovered unexpected sequences for the evaluation – interpretation – updateprocess shown in Figure 7.1.

Belief System

Multiple Unexpected Sequence Extraction Prior Knowledge

Sequence Database Unexpected Sequences

Novel Knowledge

Figure 7.1: Theevaluation – interpretation – update process.

We propose a self-validation process for evaluating unexpected sequences with the notions of unexpected sequential patterns. In this process, for a set of unexpected sequences, we first discover unexpected sequential patterns, which include internal and external unexpected sequential patterns for depicting the frequent common structures inside and outside the unexpectedness.

Hence, more contributions to generated unexpected sequential patterns an unexpected sequence has, more reliable the unexpected sequence is.

Further, with mining sequential patterns in different compositions of unexpected sequences, we also propose the notions ofunexpected implication rules, includingunexpected class rule,unexpected association rule, andunexpected occurrence rule, for understanding what happens associated with the unexpectedness, what implies the unexpectedness, and what the unexpectedness implies.

The approaches proposed in this chapter have close connections with sequential pattern mining [AS95].

In the past fifteen years, many approaches have been proposed and developed with focusing on improving the efficiency of execution time and memory usage in sequential pattern mining, such as Apriori ([AS95]), GSP ([SA96b]), PSP ([MCP98]), PrefixSpan ([PHMAP01]), SPADE ([Zak01]),

tel-00431117, version 1 - 10 Nov 2009

7.2. UNEXPECTED SEQUENTIAL PATTERNS 101

Dans le document tel-00431117, version 1 - 10 Nov 2009 (Page 121-125)