Consistent biclustering - Support Vector Machines

Support Vector Machines

7.2 Consistent biclustering

In this section, the notion of consistent biclustering is introduced. This part of the chapter makes a large use of mathematical symbols: the symbology utilized follows.

As already observed, the set of clustersSr and the set of clustersFr represent two partitions of the samples and of the features of a set of data. Each clusterSr orFr

has a certain center. Since we have to deal with two different partitions (samples and features), let us denote the center of the generic clusterSr with the symbolc^S_r and the center of the generic clusterF_r with the symbolc^F_r . The centerc^S_r refers to ther^{t h}cluster of the samples. Since it is the average of samples represented by m-dimensional vectors,c_r^Sis anm-dimensional vector. These vectors can be organized into anm×kmatrixC_S, where the centers are stored column by column, just as the samples in the matrixA. The same can be done in correspondence of the clustersF_r and their centers. The generic centerc^F_r refers to ther^{t h}cluster of features. A matrix CF can be defined where such centers are organized column by column.CF is an n×kmatrix, since each feature is represented by ann-dimensional vector. Since the matricesCSandCF contain averages, their elements are the average expressions of the corresponding samples and features. It is clear that the nomenclature “average expression’’ comes from the studies on gene expression data. An average expression can be evaluated by a non-negative number: we will suppose in the following that all the centers have non-negative values.

Matrices are widely used in biclustering:Acontains the set of data to partition in biclusters;C_SandCFcontain the centers of the clustersSr andFr, respectively.aij

refers to thei^{t h}feature of thej^{t h}sample. A sample can be referred to asa^j: thejas superscript means that thejrefers to the column index of the matrixA. Similarly,a_i refers to thei^{t h}row of the matrix, i.e., to thei^{t h}feature. The same symbology can be used for elements inC_SandC_F.c^S_irrefers to thei^{t h}component of the center of the clusterSr;c^F_{j r}refers to thej^{t h}component of the center of the clusterFr.

As already pointed out, the two single clusters in a bicluster(Sr, Fr)are related.

Actually, once a partition in clusters of the samples is provided, a corresponding partition in clusters of the features can be obtained. Vice versa, a partition in clusters S_r can be obtained from the clustersFr. Let us suppose then that the clustersSr are known. In this case, each sample or columna^j is assigned to a certain cluster. The centers of all the clustersS_r are also known and contained in the matrixC_S column by column. The generic elementc^S_irof the matrix represents the average expression of thei^{t h}feature in ther^{t h}cluster, among all the samples inSr. Letrˆbe the cluster in which thei^{t h}feature is most expressed. In mathematical formulas,rˆcan be defined as the index such that the following condition is satisfied:

ai ∈F_r_ˆ ⇐⇒ c^S_i_r_ˆ> c^S_iξ ∀ξ ∈ {1,2, . . . , k} ξ = ˆr . (7.2) Intuitively, it is reasonable to assign the featurea_i to the clusterF_r_ˆ. If the condition (7.2) is applied for all the indicesi∈ {1,2, . . . , k}and all the featuresaiare assigned to the corresponding clustersF_r_ˆ, a partition in clustersF_ris obtained from a previous partition in clustersS_r.

The same procedure can be applied for obtaining a partition of the samples when a partition of the features is known. The following rule can be used for assigning a samplea^j to a certain clusterSˆr:

a^j ∈ ˆS_r_ˆ ⇐⇒ c_j^F_r_ˆ> c_{j ξ}^F ∀ξ ∈ {1,2, . . . , k} ξ = ˆr . (7.3) If this rule is applied for eachj, a new partition in clustersSˆr is obtained from the partition in clustersFr. Note that a symbol is used for discriminating the generic clusterSrand the generic clusterSˆr. Indeed,Sris the generic cluster used for finding a partition in clustersFrof the features, whereasSˆrrepresents the partition in clusters obtained from the clustersFr. Two different notations forSrandSˆr are used because these two partitions of samples can be different in general. Even thoughSrgenerated Fr andFr generatedSˆr, there are no reasons whySr andSˆr should correspond. If they correspond, then the partition in biclusters(Sr, Fr)is calledconsistent.

It is important to note that not all the sets of data admit a consistent partition in biclusters. This may happen because there may not be a statistical evidence that a sample or a feature belongs to a certain cluster. If a consistent partition in biclusters exists for a certain set of data, then it is said to bebiclustering-admitting. When it is not the case, samples or features are usually deleted from the set of data for letting it become biclustering-admitting. In this case, it is important to delete the least possible in order to preserve the information in the set of data. This procedure is known as feature selection.

The requirement of consistency can be weak in some cases. Let us suppose that a partition in clustersSris available, and that a partition in clustersFris obtained from it. Each feature is therefore assigned to the clusterF_r_ˆ such thatc^S_i_r_ˆhas the largest value in the vectorc^S_i. Let us suppose now that the following condition holds:

min

ξ=ˆr{c_i^S_r_ˆ−c^S_iξ} ≤ε (7.4)

whereεis a small number. In this case, small changes in the data can bring different partitions of the features in the clustersFr. Indeed, small variations of the samples bring variations of the centers of the clustersSr, and this can bring a different feature to be more expressed. The following example should clarify this concept.

Let us suppose that the data are partitioned in two biclusters only.S₁andS2are known, as well as their centersc^S₁andc^S₂. The features are also partitioned into two clustersF₁andF₂. Each feature is assigned to one of the two clusters depending on their average expressions in the corresponding clustersS_r. Therefore, the generic featurea_i is assigned toF1ifc_i^S₁> c^S_i2, and vice versa. Let us suppose for instance thatc^S_i1 = 5.9 andc^S_i2 =6.1. Then,ai is assigned toF2. However, the condition (7.4) holds withα≥0. This means that it is not evident statistically thatai belongs toF2. Indeed, let us suppose that another sample is added to the set of data, and that it is assigned to clusterS1. The center ofS1hence changes, and in particular itsi^{t h}component changes. If the featureai is more expressed in this sample, the averagec^S_i1can increase. Since it is an average and it considers all the samples in the same cluster, it cannot change dramatically, even though the new sample might be different from the others. However, in the considered example, the featureai might be assigned to a different cluster after the new sample is added. If indeedc^S_i1is now equal to 6.2, thenc^S_i1> c^S_i2, and the featurea_i is assigned toF₁.

In order to overcome this kind of problem, conditions stronger than consistent biclustering are introduced in [176]. A biclustering is called anadditive consistent biclusteringwith parameterαor anα-consistent biclusteringif the following two relations holds

ai ∈ ˆF_r_ˆ ⇐⇒ c_i^S_r_ˆ> α_j^F +c_iξ^S ∀ξ ∈ {1,2, . . . , k} ξ = ˆr (7.5) a^j ∈ ˆS_r_ˆ ⇐⇒ c^F_j_r_ˆ> α^S_i +c^F_{j ξ} ∀ξ ∈ {1,2, . . . , k} ξ = ˆr (7.6) where eachα^F_j andα_i^Sare positive numbers. It is easy to prove that anα-consistent biclustering is a consistent biclustering, but not the inverse. Indeed, if the conditions (7.5) and (7.6) are satisfied withα_j^F >0 andα_i^S>0, then they keep being satisfied withα_j^F = 0 and α_i^S = 0. Inversely, let us suppose thatc^S_i_r_ˆ > α^F_j +c^S_iξ for all theξ different fromr, in correspondence with some featureˆ aiand withα^F_j =0. If α_j^F is successively modified and it becomes positive, then the condition may not be satisfied anymore. The quantityα_j^F+c^S_iξ becomes larger, and therefore the quantity c^S_i_r_ˆmay not be greater than it anymore.

Similar toα-consistent biclustering is theβ-consistent biclustering. A biclustering is called amultiplicative consistent biclusteringwith parameterβor aβ-consistent biclusteringif the following two relations holds

a_i ∈ ˆF_r_ˆ ⇐⇒ c_i^S_r_ˆ> β_j^Fc_iξ^S ∀ξ ∈ {1,2, . . . , k} ξ = ˆr (7.7) a^j ∈ ˆS_r_ˆ ⇐⇒ c^F_j_r_ˆ> β_i^Sc^F_{j ξ} ∀ξ ∈ {1,2, . . . , k} ξ = ˆr (7.8)

whereβ_j^F > 1 andβ_i^S > 1. As before, aβ-consistent biclustering is a consistent biclustering.

Dans le document DATA MINING IN AGRICULTURE (Page 167-170)