• Aucun résultat trouvé

Discussion

Dans le document tel-00431117, version 1 - 10 Nov 2009 (Page 102-106)

In this chapter, we first extended the framework Musewith taking account of the fuzziness in the unexpectedness on sequence occurrence (tau-fuzzy) and developed the approachTaufu. We then proposed the notion of fuzzy recurrence sequence, with which we developed the approach Ufr to discover unexpected fuzzy recurrences within the framework Muse. Experiments on various real Web server access data show the performance of the approaches Taufuand Ufr.

We studied the fuzziness in unexpected sequence occurrence, where the notion oftau-fuzzy is based on the gap between premise sequence and conclusion sequences, and the notion of fuzzy recurrence sequence is based on the number of sequence occurrences.

There is a very extended way of considering fuzzy association rules in discovering the unex-pectedness in data. It can be a more general model that: from a rule “if X is A, then Y is B”, if we consider “A semantically contradicts to C” or “B semantically contradicts to D”, then “if X is C, thenY isB” or “if X isA, thenY isD” are unexpected. For instance, if “age isold → salary is high” corresponds to prior knowledge, then “age is young → salary is high” or “age is old → salary is low” can be considered as unexpected.

The same manner can also be extended to gradual rules, that is, if prior knowledge shows that

“ age increases → salary increases”, then “age increases → salary decreases” is unexpected, etc.

The fuzzy extensions presented in this chapter improve the flexibility of representing the un-expectedness within the framework Muse. Our future research work includes the construction and discovery of more general models of unexpected sequences and rules within the framework of fuzzy association rules and fuzzy sequential patterns. On the other hand, in order to improve the flexibility of representing prior knowledge (i.e., the construction of belief system), we study the generalization problem of the framework Muse in the next chapter.

tel-00431117, version 1 - 10 Nov 2009

Chapter 6

Generalizations in Unexpected Sequence Discovery

In the previous chapter, we extended the framework Muse with fuzzy methods, which improve the interpretability of discovered unexpected sequences. On the other hand, the effectiveness of the frameworkMuse, with or without fuzzy extensions, depends on the relevancy of beliefs, where the specification of sequence rules and semantic contradictions with respect to prior knowledge is an essential however complex task. To reduce the complexities in constructing beliefs, we present a generalized approach to discover of unexpected sequences with concept hierarchies.

A part of the work presented in this chapter has been published in the International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS).

6.1 Introduction

The framework Muse proposed in Chapter 4 discovers unexpected sequences with respect to the beliefs based on prior knowledge, where the effectiveness of Muse depends on the relevancy of beliefs. However, for constructing beliefs, the specification of sequence rules and semantic contradictions is an essential however complex task.

On the other hand, in real-world database applications, many data have a human-defined taxonomy that is often organized in hierarchies, where the semantics of an item are represented with respect to hierarchical taxonomy of concepts.

Hence, although beliefs can be seriously specified with expertise of application domain, the enumeration of the complete sets of rules and semantic contradiction relations based on items is often a hard work. The following example illustrates this problem.

Example 24 Let us consider the instance addressed in Example 11, where customer transaction records are stored as the items purchased by a customer per transaction. Assume that in each

79

tel-00431117, version 1 - 10 Nov 2009

80 CHAPTER 6. GENERALIZATIONS IN UNEXPECTED SEQUENCE DISCOVERY product category, including Sci-Fi Novel, Action Movie DVD, Sci-Fi Movie DVD, Rock Music CD, and Classical Music CD, there are 10 different products, that is, 10 distinct items under each end concept with respect to the hierarchical taxonomy shown in Figure 6.1.

Concepts Product

Book CD DVD

... ... Music ... ... Movie ...

Novel ...

... Classical ... Rock ... Action Sci−Fi Sci−Fi ...

... ... ... ...

... ...

... ... ... ... ... ...

... ...

... ... ... ... ... ... ... ... ... ... ... ... ... ... Items

Figure 6.1: Hierarchical taxonomy of products.

Further, we assume that the product relations and customer transaction records are stored in a database like the relations listed in Table 6.1.

Prod.ID Prod.Category Prod.Name

· · · ·

12101 Book.Novel.SciFi · · · 12102 Book.Novel.SciFi · · ·

· · · ·

22101 CD.Music.Classical 22102 CD.Music.Classical

· · · ·

22301 CD.Music.Rock · · · 22302 CD.Music.Rock · · ·

· · · ·

32201 DVD.Movie.Action · · · 32202 DVD.Movie.Action · · ·

· · · ·

32301 DVD.Movie.SciFi · · · 32302 DVD.Movie.SciFi · · ·

· · · ·

Cust.ID Trans.ID Items

· · · ·

C00206 T000586 11105 12108 C00206 T000977 12109 C00206 T001108 32201 32202 C00206 T001210 32205 32307 C00206 T001555 21209 C00206 T001809 22303 C00206 T002112 22507

· · · · C01052 T001375 12101 C01052 T001664 22305 32301 C01052 T001792 12108 32308 C01052 T001860 32201 32202 32302 C01052 T002276 31202

C01052 T002279 22101

· · · · Table 6.1: Product relations and customer transaction records.

In such a database system, to discover the customer transaction sequences unexpected to the

tel-00431117, version 1 - 10 Nov 2009

6.1. INTRODUCTION 81

behaviors described in the belief b3 =n

h(Sci-Fi-Novel)(Action-Movie Sci-Fi-Movie)i → h(Rock-Music)i o

∧ n

h(Rock-Music)i 6≃sem h(Classical-Music)i o

of Example 11, each item should be specified according to theSeqMatch routine in the approach Muse, that is, as the form of the following beliefs

· · · , bi =n

h(12101)(32201 32301)i → h(22301)i o

∧n

h(22301)i 6≃sem h(22101)i o , bj =n

h(12102)(32201 32301)i → h(22301)i

o∧n

h(22301)i 6≃sem h(22101)i o

,

· · · .

Hence, there exist 104 sequence rules and 102 semantic contradiction relations that cover all possible combinations of items, and it is necessary to totally generate 105 beliefs instead of one

belief on the generalization of hierarchical taxonomy.

Indeed, generalizations have been well concentrated in mining association rules [SA95, HF95, HMWG98, TS98, HW02, TL07, KZC08] and sequential patterns [SA96b, TS98, LLW02, dAdSRJ03, MPT04, HY06] during the past decade.

Srikant and Agrawal first studied the generalization problem in association rule mining [SA95], where the taxonomy on items is considered as is-a hierarchy. For instance, according to the hierarchy shown in Figure 6.1, we can say that “Sci-Fi-Novelis-aNovel is-aBook”. The proposed approach is therefore to discover the association rules like (Novel Rock-Music)→(Action-Movie) with considering each concept as an item and pruning itemsets containing an item and its ancestor.

This work has been extended to discover generalized sequential patterns in [SA96b], which are maximal frequent sequences like “NovelandRock-Musicfollowed byAction-Movie, then followed by item32301”. Many approaches have been developed to improve the efficiency of mining generalized association rules and sequential patterns [HMWG98, HW02, LLW02, dAdSRJ03, MPT04, HY06, TL07, KZC08], which effectively reduce the number of discovered patterns, rules, or sequences in comparison with the results without data generalization.

Therefore, to benefit from high-level knowledge on the taxonomy of data, in this chapter, we propose a generalized approach to discover unexpected sequences withconcept hierarchiesin order to reduce the complexities in belief construction.

The rest of this chapter is organized as follows. We first formalize the definitions of concept hierarchy and generalized sequences in Section 6.2, then propose the notion of generalized beliefs in Section 6.3. In Section 6.4, we discuss the unexpected sequences in hierarchical data with respect

tel-00431117, version 1 - 10 Nov 2009

82 CHAPTER 6. GENERALIZATIONS IN UNEXPECTED SEQUENCE DISCOVERY to generalized beliefs, and we further propose a method for determining the semantic contradiction between generalized sequences with respect to concept hierarchies, which proceeds to the discovery of unexpected sequences without specifying semantic contradictions. We show the experiments of discovering unexpected sequences with concept hierarchies in Section 6.6 and Section 6.7 is a discussion.

Dans le document tel-00431117, version 1 - 10 Nov 2009 (Page 102-106)