Reusable Component Mining from Similar Potential Ones

3.3 Identifying Classes Composing Reusable Components

3.3.3 Reusable Component Mining from Similar Potential Ones

3.3. Identifying Classes Composing Reusable Components 41

Figure 3.5: An example of a dendrogram tree

aims at traveling through the built dendrogram, in order to extract the best clus-ters, representing a partition.

To build a dendrogram, the algorithm starts by considering individual compo-nents as initial leaf nodes in a binary tree. Next, the two most similar nodes are grouped into a new one, i.e., as a parent of them. For example, in Figure3.5, the C2

and C3 are grouped. This is continued until all nodes are grouped in the root of the dendrogram. Algorithm2presents the procedure used to gather similar components onto a dendrogram. It takes a set of potential components as an input. The result of this algorithm is a hierarchical tree representation of candidate clusters.

To identify the best clusters, a depth rst search algorithm is used to travel through the dendrogram tree. It starts from the tree root to nd the cut-o points.

It compares the similarity of the current node with its children. If the current node has a similarity value exceeding the average similarity value of its children, then the cut-o point is in the current node where the children minimize the quality function value. Otherwise, the algorithm continues through its children. Algorithm 3presents the procedure used to extract clusters of components from a dendrogram.

The results of this algorithm are clusters, where each one groups a set of similar components.

3.3.3 Reusable Component Mining from Similar Potential Ones

Algorithm 3: Dendrogram Traversal Input: Dendrogram Tree(dendrogram)

Output: A Set of Clusters of Potential Components(clusters) Stacktraversal;

traversal.push(dendrogram.getRoot());

while (!traversal.isEmpty()) do Nodef ather =traversal.pop();

Nodelef t=dendrogram.getLeftSon(f ather);

Noderight =dendrogram.getRightSon(f ather);

if similarity(f ather) > (similarity(lef t) + similarity(right) / 2) then clusters.add(f ather)

else

traversal.push(lef t);

traversal.push(right);

end end

returnclusters

3.3.3.1 Method to Identify Reusable Component Based on Similar Ones Classes composing similar components are classied into two types. The rst one consists of classes that are shared by these components. We call these classes as Shared classes. In Figure 3.6, C3, C4, C8 and C9 are examples of Shared classes.

The second type is composed of other classes that are diversied between the com-ponents. These are called as Non-Shared classes. C1, C2 and C10 are examples of Non-Shared classes in the cluster presented in Figure 3.6.

Since that Shared classes are identied in several products to be part of one com-ponent, we consider that Shared classes form the core of the reusable component.

Thus C3, C4, C8 and C9 should be included in the component identied from the cluster presented in Figure3.6. However, these classes may not form a correct com-ponent following our quality measurement model. Thus some Non-Shared classes need to be added to the reusable component, in order to keep the component quality high. The selection of a Non-Shared class to be included in the component is based on the following criteria:

• The quality of the component obtained by adding a Non-Shared class to the core ones. This criterion aims at increasing the component quality. Therefore classes maximizing the quality function value are more preferable to be added to the component.

• The density of a Non-Shared class in a cluster of similar components. This refers to the occurrence ratio of the class compared to the components of this group. It is calculated based on the number of components including the class to the total number of components composing the cluster. We consider that

3.3. Identifying Classes Composing Reusable Components 43 a class having a high density value contributes to build a reusable component since it keep the component belonging to a larger number of products. For example, in Figure3.6, the densities of C2 and C1 are respectively 66% (2/3) and 33% (1/3). Thus C2 is more preferable to be included in the compo-nent than C1, since that C2 keeps the reusable compocompo-nent belonging to two products, while C1 keeps it belonging only to one product.

Figure 3.6: An example of a cluster of similar components

3.3.3.2 Algorithm Providing Optimal Solution for Reusable Component Identication

Based on the method given in the previous section, an optimal solution which consid-ers the two criteria of Non-Shared class selection can be given through the following algorithm. First, for each cluster of similar components, we extract all candidate subsets of classes among the set of Non-Shared ones. Then, the subsets that reach a predened density threshold are only selected. The density of a subset is the average densities of all classes in this subset. Next, we evaluate the quality of the compo-nent formed by grouping core classes with classes of each subset resulting from the previous step. Thus the subset maximizing the quality value is grouped with the core classes to form the reusable component. Only components with a quality value higher than a predened threshold are retained. Algorithm 4shows this procedure, such that Q refers to the component quality tness function of ROMANTIC, Q-threshold refers to the predened quality Q-threshold and D-Q-threshold refers to the predened density threshold.

Nevertheless the above algorithm is NP-complete problem (i.e., the complexity of identifying all subsets of a collection of classes is O(2ⁿ)). This means that the computing time will be accepted only for components with a small number of Non-Shared classes. This algorithm is not scalable for a large number of Non-Non-Shared

classes. For example, 10 Non-Shared classes need 1024 computational operations, while 20 classes need 1048576 computational operations.

Algorithm 4: Optimal Solution for Mining Reusable Components Input: Clusters of Components(clusters)

Output: A Set of Reusable Components(RC) for eachcluster ∈clusters do

shared=cluster.getFirstComponent().getClasses;

allClasses =∅;

for eachcomponent∈cluster do

shared=shared∩component.getClasses();

allClasses =allClasses∪component.getClasses();

end

nonShared=allClasses−shared;

allSubsets = generateAllsubsets(nonShared);

reusableComponent =shared; bestComp=reusableComponent; for eachsubset∈allSubsetsdo

if Density(subset)>D−thresholdthen

if Q(reusableComponent∪subset)) > Q(bestComp) then bestComp=reusableComponent∪subset;

end end end

if Q(bestComp) >= Q−thresholdthen add(RC,bestComp);

end end

returnRC

3.3.3.3 Algorithm Providing Near-Optimal Solution for Reusable Com-ponent Identication

As a consequence of the complexity of the algorithm given in the previous section (the optimal result algorithm), we dened an heuristic algorithm as an alternative.

This algorithm is as follows. First of all, Non-Shared classes are evaluated based on their density. The Classes that do not reach a predened density threshold are rejected. Then, we identify the greater subset that reaches a predened quality threshold when it is added to the core classes. To identify the greater subset, we consider the set composed of all Non-Shared classes as the initial one. This subset is grouped with the core classes to form a component. If this component reaches the predened quality threshold, then it represents the reusable component.

Otherwise, we remove the Non-Shared class having the lesser quality value compared

3.4. Identifying Component Interfaces 45

Dans le document Supporting Reuse by Reverse Engineering Software Architectures and Components from Object-Oriented Product Variants and APIs (Page 51-55)