Bicluster enumeration using Formal Concept Analysis

(1)

HAL Id: hal-01095884

https://hal.inria.fr/hal-01095884

Submitted on 16 Dec 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Bicluster enumeration using Formal Concept Analysis

Victor Codocedo, Amedeo Napoli

To cite this version:

Victor Codocedo, Amedeo Napoli. Bicluster enumeration using Formal Concept Analysis. What

formal concept analysis can do for artificial intelligence? (FCA4AI 2014) Workshop at ECAI 2014,

Aug 2014, Prague, Czech Republic. �hal-01095884�

(2)

Analysis

V´ıctor Codocedo and Amedeo Napoli

LORIA - CNRS - INRIA - Universit´e de Lorraine, BP 239, 54506 Vandœuvre-les-Nancy.

victor.codocedo@loria.fr, amedeo.napoli@loria.fr,

Abstract. In this work we introduce a novel technique to enumerate constant row/column value biclusters using formal concept analysis. To achieve this, a numerical data-table (standard input for biclustering al- gorithms) is modelled as a many-valued context where rows represent objects and columns represent attributes. Using equivalence relations de- fined for each single column, we are able to translate the bicluster mining problem in terms of the partition pattern structure framework. We show how biclustering can benefit from the FCA framework through its ro- bust theoretical description and efficient algorithms. Finally, we show how this technique is able to find high quality biclusters (in terms of the mean squared error) more efficiently than a state-of-the-art bicluster algorithm.

1 Introduction

Biclustering has become a fundamental tool for bioinformatics and gene expres- sion analysis [4]. Different from standard clustering where objects are compared and grouped together based on their full descriptions, biclustering generates groups of objects based on a subset of their attributes, values or conditions.

Thus biclusters are able to represent object relations in a local scale instead of the global representation given by an object cluster [12]. In this sense, biclus- tering has many elements in common with Formal Concept Analysis (FCA) [6].

In FCA objects are grouped together by the attributes they share in what is

called a formal concept. Furthermore, formal concepts are arranged in a hierar-

chical and overlapping structure denominated a concept lattice. Hence a formal

concept can be considered as a bicluster of objects and attributes representing

relations in a local scale, while the lattice structure gives a description in the

global scale. FCA is not only analogous to biclustering, but has much to offer

in terms of mining techniques and algorithms [10]. The concept lattice can also

provide biclusters with an overlapping hierarchy which has been reported as an

important feature for bicluster analysis [15]. Recently, some approaches consid-

ering the use of FCA algorithms to mine biclusters from a numerical data-table

have been introduced showing good potential [8, 7]. In this work, we present a

novel technique for lattice-based biclustering using the pattern structure frame-

work [5], an extension of FCA to deal with complex data. More specifically, we

(3)

V´ıctor Codocedo and Amedeo Napoli

propose a technique for mining biclusters with similar row/column values, a spe- cialization of biclustering focused on mining attributes with coherent variations, i.e. the difference between two attributes is the same for a group of objects [12].

We show that, by the use of partition pattern structures [1], we can find high quality maximal biclusters (w.r.t. the mean squared error). Finally, we compare our approach with a standard constant row value algorithm [3], showing the capabilities and limitations of our approach.

The remainder of this paper is organized as follows. The basics of bicluster- ing are introduced in Section 2. Section 3 presents our approach and Section 4 presents the experiments and initial findings of our biclustering technique.

Finally, Section 5 concludes our article and presents some new perspectives of research.

2 Biclustering definitions

A numerical data-table is a matrix M where M

ij

indicates the value of an object g

i

∈ G w.r.t. the attribute m

j

∈ M with i ∈ [1..|G|] and j ∈ [1..|M|] (| · | represents set cardinality). A bicluster of M is a submatrix B where each value B

ij

satisfies a given restriction. According to [4, 12], there are five different restrictions which we summarize in Table 1.

Constant values B_ij=c Within the submatrix, all values are equal to a constant c∈R(Rindicates real values).

Constant row values

B_ij=c + αi Within the submatrix, all the values in a given rowi are equal to a constantcand a row adjustmentαi∈R.

Constant column values

B_ij=c + αj Within the submatrix, all the values in a given columnj are equal to a constantcand a column adjustmentαj∈R.

Coherent values B_ij= c + αi + βj

Within the submatrix, all the values in a given columnj are equal to a constantc, a row adjustmentαi and a column adjustmentβj. Instead of addition, the model can also consider multiplicative factors.

Coherent evolution Values in the submatrix induce a linear order.

Table 1: Types of biclusters.

Similar values instead of constant values When noise is present in a data- table, it is difficult to search for constant values. Several approaches have tackled this issue in different ways, e.g. by the use of evaluation functions [14], equiva- lence relations [2, 13] and tolerance relations [7]. The most common way is es- tablishing a threshold θ ∈ R to enable the similarity comparison of two different values w

1

, w

2

∈ R. We say that w

1

≃

θ

w

2

(values are similar) iff |w

1

− w

2

| ≤ θ.

Thus, constant values are a special case of similar values when θ = 0. Using this, we can redefine the first three types of biclusters as follows:

1. Similar values: B

_ij

≃

_θ

B

_kl

.

(4)

2. Similar row/column values:

(a) Similar row values: B

ij

≃

θ

B

il

. (b) Similar column values: B

ij

≃

θ

B

kj

.

Example 1. With θ = 1, Table 2 shows in its upper left corner a bicluster with similar values (dark grey). The upper right corner represents a similar column bicluster (light grey). Lower left corner considering {g

3

, g

4

} and {m

1

, m

2

} (not marked in the table) represents a similar row bicluster.

m

1

m

2

m

3

m

4

m

5

g

1

1 2 2 1 6 g

²

2 1 1 0 6 g

³

2 2 1 7 6 g

⁴

8 9 2 6 7 Table 2: Bicluster with similar values

(θ = 1).

m

¹

m

³

g

²

2 1 g

³

2 1 Table 3:

Constant column bicluster.

3 Biclustering using partition pattern structures

The pattern structure framework is an extension of FCA proposed to deal with complex data [5]. Partition pattern structures are an instance of the pattern structure framework proposed to mine functional dependencies among attributes of a database [1] dealing with set partitions. In the following, we provide the specifics of partition pattern structures where the main definitions are given in [5].

Let G be a set of objects, M a set of attributes and M a data-table of numerical values where M

ij

contains the value of attribute (column) m

j

∈ M in object (row) g

i

∈ G. A partition d = {p

i

} of the set G can be formalized as a collection of components p

i

such as:

[

pi∈d

p

i

= G p

i

∩ p

j

= ∅ ; (p

i

, p

j

∈ d, i 6= j)

Two partitions can be ordered by the coarser-finer relation where we say that a partition d

1

= {p

i

} is a refinement of d

2

= {p

j

} (or d

2

is a coarsening of d

1

) iff ∀ p

i

∈ d

1

, ∃ p

j

∈ d

2

, p

i

⊆ p

j

. We denote this as d

1

⊑ d

2

where d

1

, d

2

∈ D is the space of all partitions of the set G.

Let us define the mapping function δ : M → D, which assigns to each attribute

in M the partition it generates over the set of objects G, as follows:

(5)

V´ıctor Codocedo and Amedeo Napoli

δ(m

j

) = {[g

i

]

mj

| g

i

∈ G} (1)

[g

i

]

m_j

= {g

k

∈ G | M

ij

= M

kj

} (2) Where [g

i

]

^mj

is the equivalence class of g

_i

w.r.t. attribute m

_j

, i.e. the set of rows in data-table M which have the same value in column m

j

as row g

i

. Since the set of equivalence classes for a given attribute generates a partition over G, it comes naturally that δ(m

j

) ∈ D for any m

j

∈ M.

It is easy to show that the order in the space of object partitions D defines a complete lattice for which the similarity operator ⊓ for any two partitions d

1

, d

2

∈ D is defined as follows:

d

1

⊓ d

2

= [

p

i

∩ p

j

(3)

d

1

⊑ d

2

⇐⇒ d

1

⊓ d

2

= d

1

(4) Then, a partition pattern structure is determined by the triple (M, (D, ⊓), δ) in which the following derivation operators for B ⊆ M and d ∈ D are defined:

B

= l

m∈B

δ(m) (5)

d

= {m ∈ M | d ⊑ δ(m)} (6)

Similarly to standard FCA, we have that (B, d) is a partition pattern concept (pp-concept) when B

= d and d

= B and that for two pp-concepts (B

1

, d

1

) and (B

2

, d

2

), the order between them is given by (B

1

, d

1

) ≤ (B

2

, d

2

) ⇐⇒ (B

1

⊆ B

2

) or (d

2

⊑ d

1

). Pp-concepts determines biclusters as pairs (p, B) where p is a component of the partition pattern d . It should be noticed that to keep consistency with previous notation, we write biclusters as pairs (p, B) (p represent rows and B represent columns), while pp-concepts are written inversely (B, d) (B is the extent and d is the intent of (B, d)).

Proposition 1. Let (B, d) be a pp-concept, then for any partition component p ∈ d each pair (p, B) corresponds to a constant column value bicluster.

The proof of this proposition is straightforward considering that each pair (p, B) represents a submatrix the columns of which were selected using an equiv- alence relation, i.e. the values in the columns are the same.

We say that a bicluster (p, B) is maximal iff adding an object to p or an

attribute to B does not result in a bicluster, i.e. (p ∪ {g}, B) and (p, B ∪ {m}) are

not biclusters. While pp-concepts are maximal (closed under (·)

), biclusters

corresponding to pairs (p, B) are not always maximal. This is due to the fact

that pp-concepts are maximal w.r.t. the partitions and not w.r.t. the individual

components of those partitions. Nevertheless, maximal biclusters are still easy

to identify.

(6)

Proposition 2. Let (B

1

, d

1

), (B

2

, d

2

) be two pp-concepts such as (B

1

, d

1

) ≤ (B

2

, d

2

). Let p ⊆ G be a component of a partition. If p ∈ d

1

and p / ∈ d

2

then the bicluster corresponding to (p, B

1

) is maximal.

Proof. Given definitions in Equations 2, 5 and 6, we have that for (B

1

, d

1

) and for any g

i

∈ p, the following is true:

p = \

m_j∈B₁

{g

_k

∈ G | M

_ij

= M

_kj

} (7)

Consequently, for any other object g

h

∈ G, such as g

h

∈ / p, we have M

ij

6= M

hj

. Hence, the pair (p + {g

_h

}, B) cannot be a bicluster.

Let B

2

= B

1

+ {m

_j

} for any m

j

∈ M, we show that (p, B

2

) cannot be a cluster by contradiction. Let (p, B

2

) be a bicluster. Then, there exists the pp-concept (B

2

, B

₂

) such as p ∈ B

₂

. If it does, then it is necessarily a direct super concept of (B

1

, d

1

). However, this contradicts the definition p / ∈ B

₂

.

Supporting similar values: In general, it is not possible to support simi- lar value biclusters as described in Section 2 using the partition pattern struc- tures framework. This is due to the fact that the restriction B

ij

≃

θ

B

kl

⇐⇒

|B

ij

− B

kl

| ≤ θ is not transitive and hence, it is not an equivalence but a tolerance relation [10] which do not necessarily generates partitions over the set of objects. However, the setting to support this scenario is only slightly dif- ferent from the partition pattern structures framework. We do not provide its description for the sake of simplicity.

Nevertheless, through the use of interval of values we can get a close repre- sentation of similar value biclusters considering that two rows (objects) are in the same equivalence class if their values in a given column (attribute) is within a given interval (rather than being equal as described in Equation 2). For exam- ple, consider in Table 2 the intervals [0, 1] and [6, 7] for attribute m

4

. We can see that it generates the partition {g

1

, g

2

}, {g

3

, g

4

}. We call these intervals “equiva- lence blocks”, similarly as the “tolerance blocks” described in [10]. Equivalence blocks can be either pre-defined, allowing the user to include some background knowledge in the biclustering process, or calculated on-the-fly if a number of equivalence blocks γ is specified.

4 Experiments

4.1 Partition pattern concept lattice calculation

In order to calculate the partition pattern concept lattice for a given data-table we used the AddIntent algorithm as described in [16]. We applied AddIntent over a subset of the dataset called MovieLens 100k

¹

of movie ratings containing 943 users and 50 movies (out of a total of 1682) using the predefined set of equivalence blocks [1, 2][3, 3][4, 5]. The dataset contains user ratings for movies

1

http://grouplens.org/datasets/movielens/

(7)

V´ıctor Codocedo and Amedeo Napoli

0 10 20 30 40 50 60

0 20 40 60 80 100

Time [s]

Iterations [units]

Fig. 1: AddIntent Iterations per prune vs Execution time

which range from 1 to 5. When information is not available, the matrix contains 0 which we disregard (we do not mine biclusters with columns equal to 0). The dataset contained 16532 similar column biclusters.

Empirical results showed that less than 20% of the pp-concepts within the pp-lattice actually hold a maximal bicluster. In order to improve the efficiency of AddIntent for biclustering purposes we have included a pruning step between a certain number of AddIntent iterations (each time a new intent is added to the lattice). The pruning step consists of removing from the lattice any concept that do not hold a maximal bicluster. Figure 1 shows experimental results in this regard. The graphic shows the execution time (y axis) taken by AddIntent to calculate the 16532 biclusters when a pruning step was included in a given number of iterations (x axis). The solid horizontal line represents the execution time without pruning (30.5 seconds). While initially, the execution time doubles the non-optimized version (for a lattice prune each AddIntent iteration), later the time quickly stabilizes around half the time the non-optimized version. Best time is found for 40 iterations (15 seconds).

The pruning affects the number of intent intersections performed by AddIn- tent. When the lattice is pruned, there are not as many intents to intersect as there were originally. However, pruning the lattice is an expensive task and adds overhead to the algorithm. The correct balance of this trade-off leads to dramatic improvements in the performance (twice in the experiments), however further experimentation in different numerical data-tables are needed to draw more conclusions regarding its setting.

4.2 Biclusters quality

A second experiment was performed over an example dataset provided with the system BicAt

²

containing 419 objects and 70 attributes. We measure the perfor- mance of our approach mining similar row biclusters compared with Cheng and Church’s algorithm (CC) [3]. CC tries to find a determined number of biclusters

2

http://www.tik.ee.ethz.ch/sop/bicat/

(8)

with a maximum threshold for the mean squared error δ. Results are shown in Table 4. Parameters for pp-lattice are number of equivalence blocks γ and mini- mal number of columns in the cluster σ. CC was executed as provided by BicAt and other parameters were left as system’s default.

Time Biclusters Parameters MSE Max Size

[s] [Kunits] Max [cells]

PPL 451 901 γ=20,σ=10 0.016 209

PPL 27 36 γ=10,σ=30 0.032 372

PPL 306 390 γ=10,σ=25 0.037 442

PPL 3,404 4,471 γ=10,σ=20 0.041 462

PPL 253 314 γ=5,σ=50 0.259 1,173

CC 418 1 δ= 0.5 3.2 17,752

CC 416 1 δ= 0.3 2.81 17,752

CC 4,018 10 δ= 0.1 4.92 17,752

Table 4: Comparison between CC and pp-lattice bicluster algorithm.

Results show a general better performance of our approach which is able to mine more than four million maximal biclusters from the dataset in less time than CC calculates only ten thousands. In terms of minimal squared error (MSE), our approach gets smaller scores which induces better quality biclusters.

CC is able to find larger biclusters compared to our approach given the top- down strategy which implements. While larger biclusters can be found with our approach by decreasing the number of equivalent classes (γ), this is done at the cost of increasing the MSE as shown in Table 4. Compared to CC, our approach is better on finding many high quality and rather small biclusters inducing specialized associations among objects. CC is better at creating a global map of the entire data-table by finding larger biclusters.

5 Conclusions and research perspectives

In this work we have presented a novel technique for exhaustive similar row/column value biclustering based on FCA algorithms using partition pattern structures.

We have shown the capabilities of the technique which is able to find a large number of high quality biclusters. Furthermore, biclusters are provided with an overlapping hierarchy based on a concept lattice structure. How to leverage cur- rent biclusters analysis techniques using the concept lattice is still a matter of research.

Partition pattern structures were initially proposed for functional dependen-

cies mining [1] using association rules from pp-concepts. How these techniques

may benefit from the current approach and the opposite, is an interesting sub-

ject which should be explored. Using other techniques of formal concept selection

and filtering, and their associations with biclusters is another compelling aspect

for a future work.

(9)

V´ıctor Codocedo and Amedeo Napoli

References

1. Jaume Baixeries, Mehdi Kaytoue, and Amedeo Napoli, ‘Characterizing functional dependencies in formal concept analysis with pattern structures’, Annals of Math- ematics and Artificial Intelligence, (2014).

2. Jérémy Besson, Céline Robardet, Luc Raedt, and Jean-Fran¸cois Boulicaut, ‘Mining bi-sets in numerical data’, in Knowledge Discovery in Inductive Databases, (2007).

3. Yizong Cheng and George M. Church, ‘Biclustering of expression data’, in Proceed- ings of the Eighth International Conference on Intelligent Systems for Molecular Biology, (2000).

4. Adelaide Valente Freitas, Wassim Ayadi, Mourad Elloumi, Jos´eluis Oliveira, Jos´eluis Oliveira, and Jin-Kao Hao, Survey on Biclustering of Gene Expression Data, 2013.

5. Bernhard Ganter and Sergei O. Kuznetsov, ‘Pattern Structures and their projec- tions’, Conceptual Structures: Broadening the Base, (2001).

6. Bernhard Ganter and Rudolf Wille, Formal Concept Analysis, mathematic edn., 1999.

7. Mehdi Kaytoue, Sergei O. Kuznetsov, Juraj Macko, and Amedeo Napoli, ‘Biclus- tering meets triadic concept analysis’, Annals of Mathematics and Artificial Intel- ligence, (2013).

8. Mehdi Kaytoue, Sergei O. Kuznetsov, and Amedeo Napoli, ‘Biclustering numerical data in formal concept analysis’, in Formal Concept Analysis, (2011).

9. Mehdi Kaytoue, Sergei O. Kuznetsov, and Amedeo Napoli, ‘Revisiting numerical pattern mining with formal concept analysis’, Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, (November 2011).

10. Sergei O. Kuznetsov, ‘Galois connections in data analysis: Contributions from the soviet era and modern russian research’, in Formal Concept Analysis, volume 3626 of Lecture Notes in Computer Science, (2005).

11. Sergei O Kuznetsov and Sergei Obiedkov, ‘Comparing Performance of Algorithms for Generating Concept Lattices’, Journal of Experimental and Theoretical Artifi- cial Intelligence, (2002).

12. Sara C. Madeira and Arlindo L. Oliveira, ‘Biclustering algorithms for biological data analysis: A survey’, IEEE/ACM Trans. Comput. Biol. Bioinformatics, (Jan- uary 2004).

13. Sadaaki Miyamoto, ‘Lattice-valued hierarchical clustering for analyzing informa- tion systems’, in Rough Sets and Current Trends in Computing, volume 4259 of Lecture Notes in Computer Science, (2006).

14. Gaurav Pandey, Gowtham Atluri, Michael Steinbach, Chad L. Myers, and Vipin Kumar, ‘An association analysis approach to biclustering’, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2009).

15. Gianvito Pio, Michelangelo Ceci, Corrado Loglisci, Domenica D’Elia, and Donato Malerba, ‘A novel biclustering algorithm for the discovery of meaningful biological correlations between mirnas and mrnas’, EMBnet.journal, 18 (A), (2012).