• Aucun résultat trouvé

kDCI: on using direct count up to the third iteration

N/A
N/A
Protected

Academic year: 2022

Partager "kDCI: on using direct count up to the third iteration"

Copied!
1
0
0

Texte intégral

(1)

kDCI: on using direct count up to the third iteration

Claudio Lucchese HPC Lab of ISTI-CNR

Pisa, Italy.

claudio.lucchese@isti.cnr.it

Salvatore Orlando Universit` a Ca’ Foscari

Venezia, Italy.

orlando@dsi.unive.it

Raffaele Perego HPC Lab of ISTI-CNR

Pisa, Italy.

raffaere.perego@isti.cnr.it

1. Problem Statement and Solution

In Apriori-like algorithms, one of the most consum- ing operation during the frequent itemset mining pro- cess is the candidate search. At each iteration k the whole datasetDhas to be scanned, and for each trans- actiont in the database, every of its subsets of length k is generated and searched within the candidates. If a candidate is matched, it means that the transaction subsumes the candidate, and therefore its support can be incremented by one.

This search is very time demanding even if appropri- ate data structures are used to gain a logarithmic cost.

In [3, 2] we introduced adirect count technique which allows constant time searches for candidates of length 2. Given the set ofnfrequent single items, candidates of length 2 are stored using an upper triangular matrix n×n DC2with ( n2 ) cells, such thatDC2(i, j) stored the support of the 2-itemset{ij}. As shown in [1] the direct count procedure can be extended to the third it- eration using ann×n×nmatrixDC3with ( n3 ) cells, whereDC3(i, j, l) is the support of the 3-itemset{ijl}.

We thus introduced such technique in the last ver- sion of kDCI, which is level-wise hybrid algorithm.

kDCI stores the dataset with an horizontal format to disk during the first iterations. After some itera- tion the dataset may become small enough (thanks to anti-monotone frequency pruning) to be stored in the main memory in a vertical format, and after that the algorithm goes on performing tid-lists intersections to retrieve itemsets supports, and searches among can- didates are not needed anymore. Usually the dataset happens to be small enough at most at the fourth iter- ation.

2. Experiments and Conclusion

The experiments show that the improvement given by this optimization is sensible in some cases. The time needed for the third iteration is halved. Note that

0 2 4 6 8 10 12 14

0 2 4 6 8 10 12

Time spent for each itaration of kDCI

iteration

time (sec.)

kDCI kDCI with 3rd iteration direct count

Figure 1. The dataset used was T10I4D100K with a minimum absolute support of 10.

the time spent during the firsts iteration is significant for the global effectiveness of the algorithm on most datasets. In fact, in the test performed, the total time was reduced from 19 sec. to 14 sec., which means an overall speed up of about 20%.

We acknowledge the authors C.Targa, A.Prado and A.Plastino of [1], who showed the effectiveness of such optimization.

References

[1] C.Targa, A.Prado, and A.Plastino. Improving direct counting for frequent set mining. Technical report, In- stituto de Computa¸c˜ao, UFF RT-02/09 2003.

[2] Claudio Lucchese, Salvatore Orlando, Paolo Palmerini, Raffaele Perego, and Fabrizio Silvestri. kdci: a multi- strategy algorithm for mining frequent sets. InProceed- ings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, November 2003.

[3] S. Orlando, P. Palmerini, R. Perego, and F. Silvestri.

Adaptive and resource-aware mining of frequent sets. In Proc. The 2002 IEEE International Conference on Data Mining (ICDM02), page 338345, 2002.

Références

Documents relatifs

The technique for the strong normalisation of the lower layer (section 3.2) adapts Barbanera and Berardi’s method based on a symmetric notion of re- ducibility candidate [BB96] and

These elements are grouped into propositions that are: the software measurement literature is rich of useful count metrics; the Oc- cam’s razor principle defends counts;

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

Pour ce faire, il nous faut nous tourner vers nos anciens, vers la sagesse et les savoirs des peuples autochtones qui sont, comme le dit si bien Jean Malaurie, « en réserve » pour

Alternatively, TRV may be revealed by the presence, in a hadronic two-body weak decay amplitude, of a ”weak” phase, besides the one produced by strong Final State Interactions

Table 1 shows the operators’ profit levels resulting from a compensation of the USO net cost according to criterion 1 with external financing, a pay or play me- chanism

Certaines BCIs utilisent ainsi les signaux EEG afin de notamment contrôler de la musique [23, 13], mais du travail reste à faire avant d’être capable de concevoir une BCI capable

systèmes dynamiques aléatoires, réseaux des applications couplées, théorie des valeurs extrêmes, indice extrémal, opérateurs de transfert, synchronisation, ensemble minimal,