HAL Id: hal-01095884
https://hal.inria.fr/hal-01095884
Submitted on 16 Dec 2014
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Bicluster enumeration using Formal Concept Analysis
Victor Codocedo, Amedeo Napoli
To cite this version:
Victor Codocedo, Amedeo Napoli. Bicluster enumeration using Formal Concept Analysis. What
formal concept analysis can do for artificial intelligence? (FCA4AI 2014) Workshop at ECAI 2014,
Aug 2014, Prague, Czech Republic. �hal-01095884�
Analysis
V´ıctor Codocedo and Amedeo Napoli
LORIA - CNRS - INRIA - Universit´e de Lorraine, BP 239, 54506 Vandœuvre-les-Nancy.
victor.codocedo@loria.fr, amedeo.napoli@loria.fr,
Abstract. In this work we introduce a novel technique to enumerate constant row/column value biclusters using formal concept analysis. To achieve this, a numerical data-table (standard input for biclustering al- gorithms) is modelled as a many-valued context where rows represent objects and columns represent attributes. Using equivalence relations de- fined for each single column, we are able to translate the bicluster mining problem in terms of the partition pattern structure framework. We show how biclustering can benefit from the FCA framework through its ro- bust theoretical description and efficient algorithms. Finally, we show how this technique is able to find high quality biclusters (in terms of the mean squared error) more efficiently than a state-of-the-art bicluster algorithm.
1 Introduction
Biclustering has become a fundamental tool for bioinformatics and gene expres- sion analysis [4]. Different from standard clustering where objects are compared and grouped together based on their full descriptions, biclustering generates groups of objects based on a subset of their attributes, values or conditions.
Thus biclusters are able to represent object relations in a local scale instead of the global representation given by an object cluster [12]. In this sense, biclus- tering has many elements in common with Formal Concept Analysis (FCA) [6].
In FCA objects are grouped together by the attributes they share in what is
called a formal concept. Furthermore, formal concepts are arranged in a hierar-
chical and overlapping structure denominated a concept lattice. Hence a formal
concept can be considered as a bicluster of objects and attributes representing
relations in a local scale, while the lattice structure gives a description in the
global scale. FCA is not only analogous to biclustering, but has much to offer
in terms of mining techniques and algorithms [10]. The concept lattice can also
provide biclusters with an overlapping hierarchy which has been reported as an
important feature for bicluster analysis [15]. Recently, some approaches consid-
ering the use of FCA algorithms to mine biclusters from a numerical data-table
have been introduced showing good potential [8, 7]. In this work, we present a
novel technique for lattice-based biclustering using the pattern structure frame-
work [5], an extension of FCA to deal with complex data. More specifically, we
V´ıctor Codocedo and Amedeo Napoli
propose a technique for mining biclusters with similar row/column values, a spe- cialization of biclustering focused on mining attributes with coherent variations, i.e. the difference between two attributes is the same for a group of objects [12].
We show that, by the use of partition pattern structures [1], we can find high quality maximal biclusters (w.r.t. the mean squared error). Finally, we compare our approach with a standard constant row value algorithm [3], showing the capabilities and limitations of our approach.
The remainder of this paper is organized as follows. The basics of bicluster- ing are introduced in Section 2. Section 3 presents our approach and Section 4 presents the experiments and initial findings of our biclustering technique.
Finally, Section 5 concludes our article and presents some new perspectives of research.
2 Biclustering definitions
A numerical data-table is a matrix M where M
ijindicates the value of an object g
i∈ G w.r.t. the attribute m
j∈ M with i ∈ [1..|G|] and j ∈ [1..|M|] (| · | represents set cardinality). A bicluster of M is a submatrix B where each value B
ijsatisfies a given restriction. According to [4, 12], there are five different restrictions which we summarize in Table 1.
Constant values Bij=c Within the submatrix, all values are equal to a constant c∈R(Rindicates real values).
Constant row val- ues
Bij=c + αi Within the submatrix, all the values in a given rowi are equal to a constantcand a row adjustmentαi∈R.
Constant column values
Bij=c + αj Within the submatrix, all the values in a given columnj are equal to a constantcand a column adjustmentαj∈R.
Coherent values Bij= c + αi + βj
Within the submatrix, all the values in a given columnj are equal to a constantc, a row adjustmentαi and a col- umn adjustmentβj. Instead of addition, the model can also consider multiplicative factors.
Coherent evolution Values in the submatrix induce a linear order.
Table 1: Types of biclusters.
Similar values instead of constant values When noise is present in a data- table, it is difficult to search for constant values. Several approaches have tackled this issue in different ways, e.g. by the use of evaluation functions [14], equiva- lence relations [2, 13] and tolerance relations [7]. The most common way is es- tablishing a threshold θ ∈ R to enable the similarity comparison of two different values w
1, w
2∈ R. We say that w
1≃
θw
2(values are similar) iff |w
1− w
2| ≤ θ.
Thus, constant values are a special case of similar values when θ = 0. Using this, we can redefine the first three types of biclusters as follows:
1. Similar values: B
ij≃
θB
kl.
2. Similar row/column values:
(a) Similar row values: B
ij≃
θB
il. (b) Similar column values: B
ij≃
θB
kj.
Example 1. With θ = 1, Table 2 shows in its upper left corner a bicluster with similar values (dark grey). The upper right corner represents a similar column bicluster (light grey). Lower left corner considering {g
3, g
4} and {m
1, m
2} (not marked in the table) represents a similar row bicluster.
m
1m
2m
3m
4m
5g
11 2 2 1 6 g
22 1 1 0 6 g
32 2 1 7 6 g
48 9 2 6 7 Table 2: Bicluster with similar values
(θ = 1).
m
1m
3g
22 1 g
32 1 Table 3:
Constant column bicluster.
3 Biclustering using partition pattern structures
The pattern structure framework is an extension of FCA proposed to deal with complex data [5]. Partition pattern structures are an instance of the pattern structure framework proposed to mine functional dependencies among attributes of a database [1] dealing with set partitions. In the following, we provide the specifics of partition pattern structures where the main definitions are given in [5].
Let G be a set of objects, M a set of attributes and M a data-table of numerical values where M
ijcontains the value of attribute (column) m
j∈ M in object (row) g
i∈ G. A partition d = {p
i} of the set G can be formalized as a collection of components p
isuch as:
[
pi∈d
p
i= G p
i∩ p
j= ∅ ; (p
i, p
j∈ d, i 6= j)
Two partitions can be ordered by the coarser-finer relation where we say that a partition d
1= {p
i} is a refinement of d
2= {p
j} (or d
2is a coarsening of d
1) iff ∀ p
i∈ d
1, ∃ p
j∈ d
2, p
i⊆ p
j. We denote this as d
1⊑ d
2where d
1, d
2∈ D is the space of all partitions of the set G.
Let us define the mapping function δ : M → D, which assigns to each attribute
in M the partition it generates over the set of objects G, as follows:
V´ıctor Codocedo and Amedeo Napoli
δ(m
j) = {[g
i]
mj| g
i∈ G} (1)
[g
i]
mj= {g
k∈ G | M
ij= M
kj} (2) Where [g
i]
mjis the equivalence class of g
iw.r.t. attribute m
j, i.e. the set of rows in data-table M which have the same value in column m
jas row g
i. Since the set of equivalence classes for a given attribute generates a partition over G, it comes naturally that δ(m
j) ∈ D for any m
j∈ M.
It is easy to show that the order in the space of object partitions D defines a complete lattice for which the similarity operator ⊓ for any two partitions d
1, d
2∈ D is defined as follows:
d
1⊓ d
2= [
p
i∩ p
j(3)
d
1⊑ d
2⇐⇒ d
1⊓ d
2= d
1(4) Then, a partition pattern structure is determined by the triple (M, (D, ⊓), δ) in which the following derivation operators for B ⊆ M and d ∈ D are defined:
B
= l
m∈B
δ(m) (5)
d
= {m ∈ M | d ⊑ δ(m)} (6)
Similarly to standard FCA, we have that (B, d) is a partition pattern concept (pp-concept) when B
= d and d
= B and that for two pp-concepts (B
1, d
1) and (B
2, d
2), the order between them is given by (B
1, d
1) ≤ (B
2, d
2) ⇐⇒ (B
1⊆ B
2) or (d
2⊑ d
1). Pp-concepts determines biclusters as pairs (p, B) where p is a component of the partition pattern d . It should be noticed that to keep consistency with previous notation, we write biclusters as pairs (p, B) (p represent rows and B represent columns), while pp-concepts are written inversely (B, d) (B is the extent and d is the intent of (B, d)).
Proposition 1. Let (B, d) be a pp-concept, then for any partition component p ∈ d each pair (p, B) corresponds to a constant column value bicluster.
The proof of this proposition is straightforward considering that each pair (p, B) represents a submatrix the columns of which were selected using an equiv- alence relation, i.e. the values in the columns are the same.
We say that a bicluster (p, B) is maximal iff adding an object to p or an
attribute to B does not result in a bicluster, i.e. (p ∪ {g}, B) and (p, B ∪ {m}) are
not biclusters. While pp-concepts are maximal (closed under (·)
), biclusters
corresponding to pairs (p, B) are not always maximal. This is due to the fact
that pp-concepts are maximal w.r.t. the partitions and not w.r.t. the individual
components of those partitions. Nevertheless, maximal biclusters are still easy
to identify.
Proposition 2. Let (B
1, d
1), (B
2, d
2) be two pp-concepts such as (B
1, d
1) ≤ (B
2, d
2). Let p ⊆ G be a component of a partition. If p ∈ d
1and p / ∈ d
2then the bicluster corresponding to (p, B
1) is maximal.
Proof. Given definitions in Equations 2, 5 and 6, we have that for (B
1, d
1) and for any g
i∈ p, the following is true:
p = \
mj∈B1
{g
k∈ G | M
ij= M
kj} (7)
Consequently, for any other object g
h∈ G, such as g
h∈ / p, we have M
ij6= M
hj. Hence, the pair (p + {g
h}, B) cannot be a bicluster.
Let B
2= B
1+ {m
j} for any m
j∈ M, we show that (p, B
2) cannot be a cluster by contradiction. Let (p, B
2) be a bicluster. Then, there exists the pp-concept (B
2, B
2) such as p ∈ B
2. If it does, then it is necessarily a direct super concept of (B
1, d
1). However, this contradicts the definition p / ∈ B
2.
Supporting similar values: In general, it is not possible to support simi- lar value biclusters as described in Section 2 using the partition pattern struc- tures framework. This is due to the fact that the restriction B
ij≃
θB
kl⇐⇒
|B
ij− B
kl| ≤ θ is not transitive and hence, it is not an equivalence but a tolerance relation [10] which do not necessarily generates partitions over the set of objects. However, the setting to support this scenario is only slightly dif- ferent from the partition pattern structures framework. We do not provide its description for the sake of simplicity.
Nevertheless, through the use of interval of values we can get a close repre- sentation of similar value biclusters considering that two rows (objects) are in the same equivalence class if their values in a given column (attribute) is within a given interval (rather than being equal as described in Equation 2). For exam- ple, consider in Table 2 the intervals [0, 1] and [6, 7] for attribute m
4. We can see that it generates the partition {g
1, g
2}, {g
3, g
4}. We call these intervals “equiva- lence blocks”, similarly as the “tolerance blocks” described in [10]. Equivalence blocks can be either pre-defined, allowing the user to include some background knowledge in the biclustering process, or calculated on-the-fly if a number of equivalence blocks γ is specified.
4 Experiments
4.1 Partition pattern concept lattice calculation
In order to calculate the partition pattern concept lattice for a given data-table we used the AddIntent algorithm as described in [16]. We applied AddIntent over a subset of the dataset called MovieLens 100k
1of movie ratings containing 943 users and 50 movies (out of a total of 1682) using the predefined set of equivalence blocks [1, 2][3, 3][4, 5]. The dataset contains user ratings for movies
1
http://grouplens.org/datasets/movielens/
V´ıctor Codocedo and Amedeo Napoli
0 10 20 30 40 50 60
0 20 40 60 80 100
Time [s]
Iterations [units]
Fig. 1: AddIntent Iterations per prune vs Execution time
which range from 1 to 5. When information is not available, the matrix contains 0 which we disregard (we do not mine biclusters with columns equal to 0). The dataset contained 16532 similar column biclusters.
Empirical results showed that less than 20% of the pp-concepts within the pp-lattice actually hold a maximal bicluster. In order to improve the efficiency of AddIntent for biclustering purposes we have included a pruning step between a certain number of AddIntent iterations (each time a new intent is added to the lattice). The pruning step consists of removing from the lattice any concept that do not hold a maximal bicluster. Figure 1 shows experimental results in this regard. The graphic shows the execution time (y axis) taken by AddIntent to calculate the 16532 biclusters when a pruning step was included in a given number of iterations (x axis). The solid horizontal line represents the execution time without pruning (30.5 seconds). While initially, the execution time doubles the non-optimized version (for a lattice prune each AddIntent iteration), later the time quickly stabilizes around half the time the non-optimized version. Best time is found for 40 iterations (15 seconds).
The pruning affects the number of intent intersections performed by AddIn- tent. When the lattice is pruned, there are not as many intents to intersect as there were originally. However, pruning the lattice is an expensive task and adds overhead to the algorithm. The correct balance of this trade-off leads to dramatic improvements in the performance (twice in the experiments), however further experimentation in different numerical data-tables are needed to draw more conclusions regarding its setting.
4.2 Biclusters quality
A second experiment was performed over an example dataset provided with the system BicAt
2containing 419 objects and 70 attributes. We measure the perfor- mance of our approach mining similar row biclusters compared with Cheng and Church’s algorithm (CC) [3]. CC tries to find a determined number of biclusters
2
http://www.tik.ee.ethz.ch/sop/bicat/
with a maximum threshold for the mean squared error δ. Results are shown in Table 4. Parameters for pp-lattice are number of equivalence blocks γ and mini- mal number of columns in the cluster σ. CC was executed as provided by BicAt and other parameters were left as system’s default.
Time Biclusters Parameters MSE Max Size
[s] [Kunits] Max [cells]
PPL 451 901 γ=20,σ=10 0.016 209
PPL 27 36 γ=10,σ=30 0.032 372
PPL 306 390 γ=10,σ=25 0.037 442
PPL 3,404 4,471 γ=10,σ=20 0.041 462
PPL 253 314 γ=5,σ=50 0.259 1,173
CC 418 1 δ= 0.5 3.2 17,752
CC 416 1 δ= 0.3 2.81 17,752
CC 4,018 10 δ= 0.1 4.92 17,752