• Aucun résultat trouvé

Topology Representing Network Map

3.5 Topology Representation

3.5.6 Topology Representing Network Map

Summarising the previously introduced methods we can say, that all these methods seem to be a good choice for topology based dimensionality reduction, but each of them has some disadvantages. Isomap can not model multi-class problems and it is not efficient on large and noisy data sets. The main disadvantage of OVI-NG and GNLP-NG methods are that they use a non-metric mapping method and thereby only the rank ordering of the representatives is preserved during the mapping process. Isotop can indeed fall in local minima and require some care for the parametrisation [53].

Although CDA is a more complicated technique, it needs to be well parameterized [56]. Furthermore, the OVI-NG and CCA methods are not able to uncover the non-linearly embedded manifolds.

Topology Representing Network Map (TRNMap) [57, 58] refers to a group of unsupervised nonlinear mapping methods, which combines the TRN algorithm and the multidimensional scaling to visualise the data structure. As result it gives a compact representation of the data set to be analysed. The method aims to fulfill the following three criteria:

• give a low-dimensional representation of the data,

• preserve the intrinsic data structure (topology), and

• according to the users expectations: preserve the distances or the rank ordering of the objects.

TRNMap mapping method results in a visualisation map, called Topology Rep-resenting Network Map (TRNMap). TRNMap is a self-organizing model with no predefined structure which provides an expressive presentation of high-dimensional data in low-dimensional vector space. The dimensionality of the input space is not restricted. Although this method is able to provide arbitrary dimensional output map as result, for the visualisation of data structure the 2-dimensional or 3-dimensional output map is recommended. Topology Representing Network Map algorithm is based on graph distances, therefore it is able to handle the set of data lying on a low-dimensional manifold that is nonlinearly embedded in a higher-dimensional input space. For the preservation of the intrinsic data structure TRNMap computes the dissimilarities of the data points based on the graph distances. To compute the graph distances the set of data is replaced by the graph resulted of the TRN algorithm applied on the data set. The edges of the graph are labeled with their Euclidean length and Dijkstra’s algorithm [59] is run on the graph, in order to compute the shortest path for each pair of points. The TRNMap algorithm utilises the group of multidi-mensional scaling mapping algorithms to give the low-dimultidi-mensional representation of the data set. If the aim of the mapping is the visualisation of the distances of the objects or their representatives, the TRNMap utilises the metric MDS method. On the other hand, if the user is only interested in the ordering relations of the objects, the TRNMap uses non-metric MDS for the low-dimensional representation. As a result it gives compact low-dimensional topology preserving feature maps to explore the hidden structure of data. In the following the TRNMap algorithm is introduced in details.

Given a set of data X= {x1,x2, . . . ,xN}, xi ∈RD. The main goal of the algorithm is to give a compact, perspicuous representation of the objects. For this purpose the set of X is represented in a lower dimensional output space by a new set of the objects (Y), where Y= {y1,y2, . . . ,yn}, nN , (yi ∈Rd, dD).

To avoid the influence of the range of the attributes a normalisation procedure is suggested as a preparing step (Step 0). After the normalisation the algorithm creates the topology representing network of the input data set (Step 1). It is achieved by the use of the Topology Representing Network proposed by Martinetz and Shulten [60].

The number of the nodes (representatives) of the TRN is determined by the user. By the use of the TRN, this step ensures the exploration of the correct structure of the data set, and includes a vector quantisation, as well. Contrary to theε-neighbouring and k-neighbouring algorithm, the graph resulted from applying the TRN algorithm does

not depend on the density of the objects or the selected number of the neighbours. If the resulted graph is unconnected, the TRNMap algorithm connects the subgraphs by linking the closest elements (Step 2). Then the pairwise graph distances are calculated between every pair of representatives (Step 3). In the following, the original topology representing network is mapped into a 2-dimensional graph (Step 4). The mapping method utilises the similarity of the data points provided by the previously calculated graph distances. This mapping process can be carried out by the use of metric or non-metric multidimensional scaling, as well. For the expressive visualisation component planes are also created by the D-dimensional representatives (Step 5).

Algorithm 13 Topology Representing Network Map algorithm Step 0 Normalize the input data set X.

Step 1 Create the Topology Representing Network of X by the use of the TRN algorithm [60].

Yield M(D)=(W,C)the resulted graph, let wiW be the representatives (codebook vectors) of M(D). If exists an edge between the representatives wiand wj(wi,wjW, i= j ), ci,j=1, otherwise ci,j=0.

Step 2 If M(D)is not connected, connect the subgraphs in the following way:

While there are unconnected subgraphs (m(iD)M(D), i=1,2, . . .):

(a) Choose a subgraph m(iD).

(b) Let the terminal node t1m(D)i and its closest neighbor t2/m(D)i from:

t1t2 =mi nwjwk, t1,wjm(iD),t2,wk/m(iD) (c) Set ct1,t2=1.

End while

Yield M∗(D)the modified M(D).

Step 3 Calculate the geodesic distances between all wi,wjM∗(D).

Step 4 Map the graph M(D)into a 2-dimensional vector space by metric or non-metric MDS based on the graph distances of M∗(D).

Step 5 Create component planes for the resulting Topology Representing Network Map based on the values of wiM(D).

The parameters of the TRNMap algorithm are the same as those of the Topology Representing Networks algorithm. The number of the nodes of the output graph (n) is determined by the user. The bigger the n the more detailed the output map will be. The suggest the choice is n=0.2N , where N yields the number of the original objects. If the number of the input data elements is high, it can result in numerous nodes. In these cases it is practical to decrease the number of the representatives and iteratively run the algorithm to capture the structure more precisely. Values of the other parameters of TRN (λ, the step sizeε, and the threshold value of edge’s ages T ) can be the same as proposed by Martinetz and Schulten [60].

Figure3.14shows the 2-dimensional structure of the S curve data set created by the TRNMap method. As TRNMap algorithm utilises geodesic distances to calculate

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−0.8

−0.6

−0.4

−0.2 0 0.2 0.4

0.6 DP_TRNMap

Fig. 3.14 TRNMap visualisation of S curve data set

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−0.5 0 0.5

Dimension: 1

(a) (b)

(c)

0 0.5 1

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−0.5 0 0.5

Dimension: 2

0 0.5 1

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−0.5 0 0.5

Dimension: 3

0 0.5 1

Fig. 3.15 TRNMap component planes of S curve data set. a Dimension X. b Dimension Y. c Dimension Z

the pairwise dissimilarities of the quantised data, this method is able to unfold the real 2-dimensional structure of the S curve data set.

Besides the visualisation of the data structure, the nodes of TRNMap also visualise high-dimensional information by the use of the component plane representation.

Component planes of the 3-dimensional S curve data set resulted by the TRNMap are shown in the Fig.3.15. A component plane displays the value of one component of each node. If the input data set has D attributes, the Topology Representing Network

Map component plane includes D different maps according to the D components.

The structure of this map is identical to the map resulted by the TRNMap algorithm, but the nodes are represented in grayscale. White color means the smallest value, black color corresponds to the greatest value of the attribute. By viewing several component maps at the same time it is also easy to see simple correlations between attributes. Because nodes of TRNMap can be seen as possible cluster prototypes, TRNMap can provide the basis for an effective clustering method.

3.6 Analysis and Application Examples

In this section a comparative analysis is given about the previously introduced meth-ods with some examples. The analysis is based on the evaluation of mapping results of the following examples: Swiss roll data set (see Appendix A.6.5), Wine data set (see Appendix A.6.3) and Wisconsin breast cancer data set (see Appendix A.6.4).

The mapping qualities of the algorithms are analysed based on the following two aspects:

• preservation of distance and neighbourhood relations of data, and

• preservation of local and global geometry of data.

In our analysis the distance preservation of the methods is measured by the classi-cal MDS stress function, Sammon stress function and residual variance. The neigh-bourhood preservation and the local and global mapping qualities are measured by functions of trustworthiness and continuity.

All analysed visualisation methods require the setting of some parameters. In the following the next principle is followed: the identical input parameters of different mapping methods are set in the same way. The common parameters of OVI-NG, GNLP-NG and TRNMap algorithms were in all simulations set as follows: tmax= 200n, εi = 0.3,εf = 0.05,λi = 0.2n, λf = 0.01, Ti = 0.1n. If the influence of the deletion of edges was not analysed, the value of parameter Tf was set to Tf = 0.5n. The auxiliary parameters of the OVI-NG and GNLP-NG algorithms were set asαi =0.3,αf =0.01,σi =0.7n, andσf =0.1. The value of parameter K in the GNLP-NG method in all cases was set to K =2.