• Aucun résultat trouvé

Conclusion, Discussion, and Perspectives

In this chapter, we design an online environment for mining inter-dimensional association rules from data cubes as a part of a platform called CubeMining. We use a guided rule mining facility, which allows users to limit the mining process to a specific context defined by a particular portion in the mined data cube. We also provide a computation of the support and the confidence of association rules when a SUM-based measure is used. This issue is quite interesting since it expresses associations which do not restrict users’ analysis to associations driven only by the traditional COUNT measure. The support and the confidence may lead to the generation of large number of association rules. Therefore, we propose to evaluate interestingness of mined rules according to two additional descriptive criteria (lift and loevinger). These criteria can express the relevance of rules in a more precise way than what is offered by the support and the confidence. Our association rule mining procedure is an adaptation of the traditional level-wise apriori algorithm to multidimensional data. In order to make extracted knowledge easier to interpret and exploit, we provide a graphical representation for the visualization of inter-dimensional association rules in the multiinter-dimensional space of the mined data cube. Empirical analysis showed the efficiency of our proposal and the acceptable runtime of our algorithm.

In the current development of our mining solution, we integrate SUM-based mea-sures in the computation of interestingness criteria of extracted association rules.

However, this choice assumes that the selected measure is additive and has only positive values. In the suspicious regions data cube, the surface of regions is an appropriate measure for the computation of the revisited criteria. Nevertheless, the total boundary length of regions can not be used for that computation since the SUM of boundary lengths does not make concrete sense. In some cases, an OLAP context may be expressed by facts with non-additive or negative measures. For instance, in the traditional example of a sales data cube, the average of sales is typically a non-additive measure. Furthermore, the profit of sales is also an OLAP measure that can have negative values. In such situations, we obviously need a more appropriate interestingness estimation of association rule to handle a wider spectrum of measure types and aggregate functions (e.g., AVG, MAX).

Our proposal provides inter-dimensional association rules with non-repetitive predicates. Such rules consist of a set of predicate instances where each one repre-sents a modality coming from a distinct dimension. This kind of association rules helps explain a value of a dimension by other values drawn from other dimensions.

Nevertheless, an inter-dimensional association rule does not explain a modality by other ones from the same dimension. For instance, the latter type of rules is not able to explain the sales of a product by those of other products or even other product categories. In order to cope with this issue, we also need to extend our proposal in order to cover the mining of inter-dimensional association rules with repetitive predicates as well as intra-dimensional association rules. In addition, these new kinds of associations should profit from dimension hierarchies and allow modalities from multiple granularity levels.

The association rule mining process in our environment is based on an adaptation of the traditional level-wise apriori algorithm to multidimensional data. The anti-monotony property (Agrawal et al., 1993) allows a fast search of frequent itemsets, and the guided mining of association rules we express as a meta-rule limits the search space according to the analysis objectives of users. However, some recent studies have shown the limitations of Apriori and privileged the notion of frequent closed itemsets like in close (Pasquier, Bastide, Taouil, & Lakhal, 1999), pascal (Bastide, Taouil, Pasquier, Stumme, & Lakhal, 2000), closet (Pei, Han, & Mao, 2000), Charm (Zaki & Hsiao, 2002), and galicia (Valtchev, Missaoui & Godin, 2004).

Finally, measures are used in our environment for computing interestingness criteria.

We plan to study the semantics of association rules when measures appear in the expression of rules.

References

Agrawal, R., Imieliński, T., & Swami, A. (1993, May). Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1993) (pp 207-216). Washington, DC.

Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., & Lakhal, L. (2000). Mining fre-quent patterns with counting inference. SIGKDD Explor. Newsl., 2, 66-75.

Ben Messaoud, R., Boussaid, O., & Loudcher, R. S. (2006a). Efficient multidi-mensional data representation based on multiple correspondence analysis. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), Philadelphia (pp. 662-667).

Ben Messaoud, R., Boussaid, O., & Loudcher, R. S. (2006b). A data mining-based OLAP aggregation of complex data: Application on XML documents. Inter-national Journal of Data Warehousing and Mining, 2(4), 1-26.

Ben Messaoud, R., Loudcher, R. S., Boussaid, O., & Missaoui, R. (2006). Enhanced mining of association rules from data cubes. In Proceedings of the 9th ACM International Workshop on Data Warehousing and OLAP (DOLAP 2006) (pp.

11-18). Arlington, VA.

Bertin, J. (19981). Graphics and graphic information processing. de Gruyter.

Brin, S., Motwani, R., & Silverstein, C. (1997). Beyond market baskets: General-izing association rules to correlations. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1997) (pp 265-276).

Chaudhuri. S., & Dayal, U. (1997). An overview of data warehousing and OLAP technology. SIGMOD Record, 26(1), 65-74.

Chen, Q., Dayal. U., & Hsu, M. (2000). An OLAP-based scalable Web access analysis engine. In Proceedings of the 2nd International Conference on Data Warehous-ing and Knowledge Discovery (DaWaK 2000) (pp. 210-223). London.

Chen, Q., Dayal. U., & Hsu. M. (1999). A distributed OLAP infrastructure for e-commerce. In Proceedings of the 4th IECIS International Conference on Cooperative Information Systems (COOPIS 1999) (pp. 209-220). Edinburgh, Scotland.

Dong, G., Han, H., Lam, J. M. W., Pei, J., & Wang, K. (2001). Mining multi-di-mensional constrained gradients in data cubes. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001) (pp. 321-330). Rome, Italy.

Goil, S., & Choudhary, A. (1998). High performance multidimensional analysis and data mining. In Proceedings of the 1st International Workshop on Data Warehousing and OLAP (DOLAP 1998) (pp. 34-39). Bethesda, Maryland.

Heath, M., Bowyer, K., Kopans, D., Moore, R., & Jr, P.K. (2000). The digital da-tabase for screening mammography. In Proceedings of the 5th International Workshop on Digital Mammography, Toronto, Canada.

Imieliński, T., Khachiyan, L., & Abdulghani, A. (2002). Cubegrades: Generalizing association rules. Data Mining and Knowledge Discovery, 6(3), 219-258.

Inmon, W. H. (1996). Building the data warehouse. John Wiley & Sons.

Kamber, M., Han, J., & Chiang, J. (1997). Multi-dimensional association rules using data cubes. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD 1997) (pp. 207-210). Newport Beach, CA.

Kimball, R. (1996). The data warehouse toolkit. John Wiley & Sons.

Lallich, S., Vaillant, B., & Lenca, P. (2005). Parametrised measures for the evalua-tion of associaevalua-tion rules interestingness. In Proceedings of the 6th International Symposium on Applied Stochastic Models and Data Analysis (ASMDA 2005) (pp. 220-229). Brest, France.

Lenca, P., Vaillant, B., & Lallich, S. (2006). On the robustness of association rules.

In Proceedings of the 2006 IEEE International Conference on Cybernetics and Intelligent Systems (CIS 2006) (pp. 596-601). Bangkok, Thailand.

Loevinger, J. (1974). A systemic approach to the construction and evaluation of tests of ability. Psychological Monographs, 61(4).

Nestorov, S., & Jukić, N. (2003). Ad-hoc association-rule mining within the data warehouse. In Proceedings of the 36th Hawaii International Conference on System Sciences (HICSS 2003) (pp. 232-242).

Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Efficient mining of associa-tion rules using closed itemset lattices. Informaassocia-tion Systems, 24(1), 25-46.

Pei, J., Han, J., & Mao, R. (2000). Closet: An efficient algorithm for mining fre-quent closed itemsets. In Proceedings of the ACM SIGMOD International Workshop on Data Mining and Knowledge Discovery (DMKD 2000) (pp.

21-30). Dallas, Texas.

Tjioe, H. C., & Taniar, D. (2005). Mining association rules in data warehouses.

International Journal of Data Warehousing and Mining, 1(3), 28-62.

Valtchev, P., Missaoui, R., & Godin, R. (2004). Formal concept analysis for knowledge and data discovery: New challenges. In Proceedings of the 2nd International Conference on Formal Concept Analysis (ICFCA 2004) (pp 352-371).

Zaki, M. J., & Hsiao, C. J. (2002). CHARM: An efficient algorithm for closed itemset mining. In Proceeding of the 2nd SIAM International Conference on Data Mining (SDM’02), Arlington, VA.

Zhu, H. (1998). Online analytical mining of association rules. Master’s thesis, Simon Faster University, Burnaby, British Columbia, Canada, December.

Documents relatifs