• Aucun résultat trouvé

SubSect — An Interactive Itemset Visualization

N/A
N/A
Protected

Academic year: 2022

Partager "SubSect — An Interactive Itemset Visualization"

Copied!
2
0
0

Texte intégral

(1)

SubSect — An Interactive Itemset Visualization

?

Joey De Pauw1, Sandy Moens1, and Bart Goethals1,2

1 University of Antwerp, Belgium

2 Monash University, Australia

1 Introduction

Itemsets and association rules are among the most simple and intuitive patterns that are used to explore transaction datasets. However, they lack meaning with- out both context and domain knowledge. Typically a user has to sift through hundreds of these patterns before finding an interesting one, losing sight of the forest for the trees. Furthermore, interestingness is a subjective measure that can only be approximated by objective metrics or features [3].

In previous work this problem has been tackled for instance by sorting and filtering patterns based on different metrics [3] or by trying to minimize the number of reported patterns to the most descriptive subset [1]. Another approach is to represent patterns in informative visualizations and rely on the end user to find what is interesting in their respective domain [2].

We propose a novel itemset and association rule visualization that makes it possible to inspect, assess, and compare patterns at a glance. This can not only save time and effort, but also reduce errors introduced by misconceptions. Our visualization is based on the double decker plot from Hofmann et al. [2] and exploits the monotonicity property, which states that itemsets have a lower or equal support compared to the support of their subsets.

2 Visualization

Consider the example in Figure 1a. Every item in the itemset is represented in the center. The arcs around the center items show three levels of itemsets that can be formed from these items. For example, the blue full circle near the center includes all four itemsA,B,C andD, and has a frequency of 0.2 as indicated by the label and its radius. The other segments represent subsets, like for example the cyan arc which spans itemsA,B and C. In correspondence with the higher frequency of this itemset (0.25), its arc also has a proportionally larger radius.

In every image only the most interesting and informative subsets are ren- dered: for a k-itemset these are the k-1-itemsets and the 1-itemsets. Together this combination of subsets provides the most useful information: the 1-itemsets give a global context and the k-1-itemsets place the k-itemset in a local con- text. Itemsets larger than one are given a unique color, making it easier to link multiple instances of the visualization that have items in common.

?Copyright c2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0).

(2)

2 J. De Pauw et al.

A B

D C

0.80 A

0.72D 0.67 B

0.4 8

C 0.42

0.31

0.2 6

0.25 0.20

(a) Itemset{A, B, C, D} (b) Itemset{A, B, C}

Fig. 1.Our visualization for the arbitrary itemset{A, B, C, D}(a) and for one of its subsets (b). Each arc represents a set of items and shows its respective support.

Furthermore, the visualization is equipped with two interactions for maximal usability:dive deeper andα-conditional view3. Animations like hover highlight- ing indicate the presence of these interactions and gradual transition animations ease the transition between “states” of the visualization, making the effect of the interactions more clear. Clicking on the cyan arc for example will dive into its respective itemset{A, B, C}. An animation shows that itemD is removed from the center and the cyan arc becomes a full circle. Three new subsets are now visible. The result is shown in Figure 1b. Naturally this action can be repeated from the new view to dive deeper or the user can choose to go back to the top level with thereset button that just became available.

Similar to the interaction for selecting an itemset to dig deeper, it is also possible to click a single item (in the center or on the outer edges) and add it to theαset or the“scope”. In this α-conditional view, the scope is visible on a smaller visualization to the left. On the right-hand side, we see the remaining items and itemsets, but now with their frequencies relative to the scope.

References

1. Calders, T., Goethals, B.: Non-derivable itemset mining. Data Mining and Knowl- edge Discovery 14(1), 171–206 (2007)

2. Hofmann, H., Siebes, A.P., Wilhelm, A.F.: Visualizing association rules with in- teractive mosaic plots. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 227–235. ACM (2000) 3. Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure

for association patterns. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 32–41. ACM (2002)

3 A live version with examples can be found onhttps://joeydp.github.io/SubSect/.

Références

Documents relatifs

Infection with the protozoan Toxoplasma gondii is one of the most frequent parasitic infections worldwide. This obligate intracellular parasite was first described

An infinitesimal increase of the volume ∆v brings an increase in p int , so that the piston moves until another equilibrium point with a bigger volume. On the other hand, a

Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining minimal non-redundant association rules using frequent closed itemsets.. Kryszkiewicz, M.:

In Section 4, we de- scribe our approach, named Videam platform (VIsual DrivEn dAta Miner), and we explain how the visualization can drive the different steps of the data mining

The combination of a simple input file format, a set of clustering and filtering algorithms linked together with the visualization tool provides a powerful tool for data analysis

The goal of the study is to analyze, from the perspective of researchers, the design of interfaces in order to: - evaluate the adherence of real-world interfaces to the

We made a difference between those cases where dedicated visualization techniques were developed for an interactive surface (code: custom) and others that just used or slightly

In fact, most of us (and I include myself here) have done rather worse than that – in that having identified the need to think beyond growth we have then reverted to the more