A relationist and descriptive approach to stationary time series

(1)

HAL Id: hal-00203217

https://hal.archives-ouvertes.fr/hal-00203217

Submitted on 9 Jan 2008

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A relationist and descriptive approach to stationary time series

Aurélien Hazan, Vincent Vigneron

To cite this version:

Aurélien Hazan, Vincent Vigneron. A relationist and descriptive approach to stationary time series.

European Conference on Complex Systems (ECCS07), Oct 2007, Dresden, Germany. pp.00. �hal- 00203217�

(2)

time series

Aur´elien Hazan¹and Vincent Vigneron^1,2

1 TADIB, CNRS FRE 2873, 91020 Evry-Courcouronnes, France,

{aurelien.hazan,vincent.vigneron}@ibisc.univ-evry.fr

2 CES SAMOS-MATISSE CNRS URM 8095, 90 rue de Tolbiac, 75634 Paris cedex 13, France

[email protected]

Summary. This article addresses the issue of building discrete topological spaces from con- tinuous data measured on a complex system and then the statistical characterization of the obtained space. As an illustration, the sensitivity of graphs properties to thresholding is ana- lysed. A possible way to cope with that flaw is the multilevel point of view. We extend this approach ton-ary relations using simplicial complexes; statistical independence is shown to be an appropriate framework for characterizing the obtained space.

Key words: knowledge representation, random graph, binary relations, correlation, indepen- dence, simplicial complex, persistent homology

1 Introduction

In the complex systems litterature, a whole field or research is dedicated to expressing the organizational principles that shape large-scale networks [1] and their evolution [?].

Let’s take an example. In biology, three types of macromolecular networks rule the inner organization of cells: metabolic, protein-protein, and genetic regulation networks. It has been pro- posed that metabolic networks should be encoded in a graph theoretic way, which allows random graph theory to characterize them [3]: the elementary relation that units two metabolites of the network is the existence of a reaction catalyzed by given enzymes. Similar principles rule intracellular processes in many organisms, in a scale-free manner that entails for example a remarkable resistance to errors. However, the great heterogeneity of reaction strengths [2]

questions the rationale of using unweighted graphs to represent the network activity.

In neurophysiology, brain networks can be examined from several points of view; structural or anatomical studies on one hand and functional and effective ones on the other [?]. The first area deals with the physical connection at different possible scales, whether at the level of indi- vidual neurons or of brain areas. The former involves large-scale network while the latter lays stress on small-scale networks. We don’t pick example in the field of structural connectivity, but rather from functional and effective connectivity cases that both examine the activity either of neurons of brain areas. The functional case favors statistical interdependence irrespective of causality while the effective connectivity case is preoccupied by causal explanation of activi- ties of neural areas. The often quoted articles [7] illustrate the first approach. See also [?, ?] in the artificial networks context.

The two examples above underline one limit of graph-like representations: the topology of the graph may strongly depend on the definition of the binary relation that conditions the existence of an edge between two nodes. One solution would be to assign weights to edges [?, 3] that take the stand to generalize invariants defined for unweighted graphs to weighted ones. In this article we explore an alternative proposition: first we define a threshold-dependent relation to ground the existence of edges, and to superpose several graphs for different threshold values, then we characterize the global structure.

(3)

In the following, section 2 reviews relationship representation in from the experimental context, then we state definitions of relationship between several variables in statistical terms.

Section 3 is devoted to examining a detailed example of a binary relation that involves correlation and thresholding while section 4 gives a multilevel point of view on graph that allows a characterization that includes threshold shifts. Section 5 puts forward tools from computational topology by extending works on binary relationship ton-ary relationship.

2 Organizing similarities in time series

The system under study consists of anunits (we supposenis big) whose activities are in- terdependent in an unknown manner. We assume this system can be observed by means of a finite set of scalar variables, whose values are indexed by discrete time instants. We hold those time-varying activities to be random, and stationary. The outputs are time-dependent station- ary and continuous signal. The purpose of this article is to discuss the organization, rather than the explanation of such data. In particular, we look for a representation of the overall interaction between theses units. The meaning of the relationship between units will be rooted in statistical inference, since little is known about the processes, except their stationarity.

A first approach would be to define a function ofn variables, whose behaviour would be examined and would reveal interdependence. But asngrows this method tends to become untractable. By limiting the output domain onto{0,1}, one may state a satisfactory answer if we lay stress on the organization of a set of relationships (see sections 3–5).

Letuiandujbe two units whose activities are measured. To make things clear, “unitsuianduj

are related” is often understood as “correlated enough”. However this term says nothing about two rv being “sufficiently correlated”. The correlation coefficient is a meaningful measure of dependence

corr(x,y)=ρ,E[XY]=E[g(X,Y)]

g(x,y)=xy (1)

so that0≤ρ²≤1, the latter upper inequality holding ifXandYare in strict linear functional relationship. Indeed,ρis a coefficient of linear dependence, and it does not capture more complex forms of interdependence. It remains an open question: which function ofρshould be used as a measure of interdependence ?ρ²is more directly interpretable thanρitself. On the other hand, ifρ=0does not imply independence, it is difficult to interpretρas a measure of interdependence.

Example 1 (Naive correlations). Consider the random variables (rv)X∼N(0,1)andY=X². The covariance of the two rvXandY

cXY=E[XY]−x¯y¯= Z ∞

−∞

= x³

√2πe^−x²^/2dx=0. (2) where the integral vanishes because the integrand is odd-symetrical aboutx=0.E[.]andx

respectively stand for the expectation and the mean ofX.

In the following we defineR(·,·)as an indicator function for grounding statistically the statement “UiandUjare correlated enough”

Rǫ(Ui,Uj) ⇐⇒ 1

corr(Ui,Uj)> ǫ. (3)

But, this general statement says nothing about howUi andUj are related. A wide variety of other measures of correlation, with respect to tests for independence, is available – e.g.

intraclass correlation, tetrachoric correlation, biserial correlation, etc. Daniels [6] defined a class of correlation coefficients based on the expression

rD= P

i,jai jbi j

qP

i,ja²_{i j}P

i,jb²_{i j}

, (4)

(4)

whereai jandbi jdepend on then-uple(xi,xj)and(yi,yj), respectively. Though correlation constitutes a fundamental tool it has important limitations:(i)the linearity of the functional link between rv,(ii)it deals only with two variables. Since our goal, expressed at the beginning of this section, is to account for the interdependences between the outputs of a great number of functions ofmarguments (m≤n), we need to generalize this definition tom-ary relations.

We often met difficulties when a variable is correlated with a set of variables. If we find that holding another variable fixed reduces the correlation between two other variables, we infer that their interdependence arises in part – i.e. conditionnally – through this other variable. This function is known as partial correlation. Conversely, if the partial correlation is larger than the original one, we infer that the other variable was masking the correlation. Remember that we cannot assume a causal connection. We shall revert to Scharf [11, pp. 292] for demonstrations of the basic results.

Example 2 (Partial correlations). :Suppose we havenobservations on 3 (<n) variates x11,x12,x13,x21,x22,x23, . . . ,xn1,xn2,xn3

that are multinormally distributed (such an approach is not necessary but simplifies the devel- opment) and standardized. The conditional distribution ofx₁=(x1,x2)^Tgivenx3is multinor- mal so that

corr(x1,x2|x3)=ρ_12|3= ρ12−ρ13ρ23

q

(1−ρ²₁₃)(1−ρ²₂₃)

. (5)

The extension of (5) for the conditional distribution of(x1,x2,x3)givenx_K, wherex_Kdenotes any subset of(x4, . . . ,xp)gives

corr(x1,x2|x3,x_K)= ρ_12|K−ρ_13|Kρ_23|K q

(1−ρ²_13|K)(1−ρ²_23|K)

. (6)

But most certainly, (6) says nothing simple about corr(X,Y)when corr(X,Y,Z)is greater than ǫ, and this hampers the rest of our approach, for reasons that will appear in the following.

As an alternative in then-ary case, independence between variables may provide a solution since independence between two continuous rv holds when P(X < x,Y < y) = P(X <

x)P(Y<y), which can be generalized ton-variables.

Stating that two units are related is half the job: given an intricate set of (measured) relations that hold between units, what do we learn from the overall activity ? Knowledge Representa- tion Theory usually builds a graph, once a relation is defined amongnunits, as examplified by Fig.1. Then, this graph can be characterized in many ways and the mutual interactional structure can be analyzed and explained. Furthermore graphs are limited to binary relations, and as we mentionned earlier the possibility of groundingn-ary relations in statistical inference, one can build hypergraphs out of continuous signals as will be shown in section 4.

3 Organization of binary relations

In this section we limit the discussion to the case of binary relations between different units.

As mentionned before, stating relation (3) is equivalent to associating a graph to the set of measured activities of the units.

Now displaying an invariant of the interaction supposes giving a characterization of the set of relations as a whole; to do so we look for inspiration in classical tools from computer science, namely graph (spectral) theory and random matrix theory. The first step is naming the graphs we work with conveniently: we notice that the binary relation between two units is symmetric sinceRǫ(X,Y) = Rǫ(Y,X). Consequently vertices that represent the units in the graph can be either disconnected, or connected in an unoriented way. Though many extensions can be imagined (see 6), we will focus only on undirected graph.

Among many standard tools available from random graph theory (see [?]), we choose to depict

(5)

x1(t)

x2(t)

xi(t)

xn(t)

u1

u2

u3

ui un

t t t

t |C(xi,xj)|> ǫ?

Fig. 1. From stationary time dependent signals to graph via correlation.

graphs in term of degree, that quantifies the typical number of connections of a given vertex.

This quantity can be seen in a probabilistic context: for instance the estimated probability density function (pdf) of the degree.

The process of characterization of a graph using selected elements of random graph and random matrix theory is depicted by Fig.2; the following section now applies this scheme to real signals.

u1

u2

u3

ui un

d

k λ





 1 0 1 0 1 0 1 0 1







Fig. 2. Two characteristics computed from the graph: (up) the spectrum of the adjacency matrix of the graph, (down) the probability density function of the degreek→P(K=k). In that perspective, we aim at comparing graphs associated to the activity of different sets of units. Rather than specifying complex interdependence patterns between time-dependent activities of several units, we found it convenient to consider a stochastic process defined as a family(Xt)t∈Iof rv indexed byttaken from a continuous intervalI, from which we extract

(6)

a finite set of rv{Xt₁, . . . ,Xt_i, . . .Xt_n}, as shown by Fig.3. Thus, to each type of stochastic process is associated a set of vertices, whose relations can be computed from a finite number of realizations of the stochastic process over a finite time interval. For the sake of diversiy, we generate graphs from different sorts of stochastic processes (random walk, long-range dependent process) as well as deterministic time series generated by a Lorenz system in chaotic regime.

100 200 300 400 500 600 700 800 900 1000

−0.06

−0.04

−0.02 0.00 0.02

X0 Xi Xn

Fig. 3. Realizations of rv taken from one realization of a brownian stochastic process.

Questions are the following: is there a type of process corresponding to a given type of random graph such as Erd¨os-R´enyi random graphs, small-world or scale-free graphs [?, ?, ?]. At first sight, graphs built from signals usingRǫlook quite similar, as evidenced by Fig. 4.

(a) Brownian motion

(b) Long-range dependent

(c) Lorenz (d) Erd¨os-R´enyi graph

(e) Small World graph

Fig. 4. Graph topologies. (a,b,c) are built from signals usingRǫ, for values ofǫensuring that the number of edges (e=60) and the number of nodes (n=30) are uniform. On the contrary (d,e) are classical random graphs, with a probability of rewiring of1/4in the Small World case.

Hence we turn to a more quantitative comparison, as evoked earlier, with the degree distri- bution. Two possibilities were explored: the first one³is to choose a signal and to generate a set of graphs for that type of signal, then to compute the degree distribution for each graph realization, and lastly to build a global histogram for each signal in order to approximate the underlying distribution. For Erd¨os-R´enyi and Small World graphs, the theoretical distribution

3The second method was to compare pairwise each realization of the degree distribution, with the help of a statistical test (e.g. Kolmogorov-Smirnov or Wilcoxon), and to count the number of positive tests, for all possible combinations (e.g. Lorenz generated graph compared to Small-World graph), but this method didn’t prove usefull

(7)

is known (see [?], [?]), and following the law of large numbers, the histogram converges in probability to the theoretical distributions. Accordingly, Fig.5 displays empirical degree distributions generated from five types of graphs, three of which were built thanks toRǫ, the last two being Erd¨os-R´enyi random and Small World graphs, as in Fig.4.

0 5 10 15 20 25 30 35 40 45 50

0 500 1000 1500 2000 2500 3000 3500 4000

degree a

b c d e

Fig. 5. Degree histograms (a) Brownian motion, (b) Lorenz, (c) Long-range dependent noise (d) Erd¨os-R´enyi (e) Small World. In all cases, graphs are composed of50nodes, however the number of edges varies: on the left part,250edges are present, against1200on the right side.

Before commenting on these results, we must remark that in random graph theory if one needs to compare two graphs from their properties, it may be necessary to ensure that their number of nodes (and edges) is of the same order of magnitude⁴.

Now, we noticed that depending on the thresholdǫused to make a graph out of a process, the number of edges depends of the type of signal (this fact is easily explained thanks to the difference of autocorrelation functions). The degree distribution are made comparable so that disparity doesn’t fit with the constraint just stated, that edges and number of nodes should be approximately equal. Consequently,Rǫ-induced graphs can be compared from random graph theory, we must choose different values ofǫdepending on the type of signal before proceeding so that the number of edges is kept constant.

This limit being clearly exposed, we can now compare the degree distribution of different graphs. Fig. 5 shows two sets of curves, obtained for two distinct edge numbers, and we shall focus on the left part first. Three clusters can grossly be isolated: line(a), line(e), and lines (b,c,d). On the right part of the figure, we remark that two clusters can now be identified:

curves(a,b,c)on one hand, and(d,e)on the other. This counterexample shows first that classical graphs such as Erd¨os-R´enyi random graph or small-world graphs hardly approximate the

4“Consequently in random-graph theory the occupation probability is defined as a function of the system size:prepresents the fraction of the edges that are present from the possible N(N²−1)/2. Larger graphs with the samepwill contain more edges, and consequently properties like the appearance of cycles could occur for smallerpin large graphs than in smaller ones. This means that for many propertiesQin random graphs there is no unique, N-independent threshold, but we have to define a threshold function that depends on the system size”, [?] p.55

(8)

properties of Rǫ-induced graphs, and second that depending on the threshold used to build graphs, their properties -the degree distribution in that case- are not constant.

4 Multilevel organization

Comparing graph with the same threshold is inappropriate because both the number of edges and nodes matter. Considerable differences will appear for a same graph at distinct levels of normalization if these conditions were standardized. A natural idea developped in that section, is therefore to take into account the history of the graph when the normalization level moves.

We admit that for Erd¨os-R´enyi random graphs the degree distribution obeys a binomial⁵law B(n,p), that can be approximated by a normal law. Now, a normal law is completely described by its mean and standard deviation(µ, σ). Though approximating the distributions of various graphs encountered so far by a normal law would diserve more careful justification, we admit this hypothesis just to illustrate the idea of characterizing the graph simultaneously at different normalization levels.

It is easy to distinguish graphs by simply focusing on the degree distribution. Figure 6 illustrate this by plotting the parametrized curvesC: e→(µ(e), σ²(e))when the number of nodesN is kept constant, but the number of edgesegrows linearly: curves corresponding to different graphs are easily distinguished, even if they rely on a simple characteristic such as the degree distribution.

5 10 15 20 25 30 35 40

0 2 4 6 8 10 12 14 16

18 a

b c d e

µ

σ

Fig. 6. Mean and standard deviation parametrized by the number of edgese→(µ(e), σ(e))of the degree distributions.evaries from500to2000, while the number of nodesNremains equal to100. (a) Brownian motion, (b) Lorenz, (c) Long-range dependent noise (d) Erd¨os-R´enyi (e) Small World.

This confirm the dependence to thresholding evidenced by section 3, and then prove the existence of an alternative position, based on embracing in a single representation several levels of detail. It appears clearly that not only the number of edges should be taken into account to build this multilevel representation, but the number of nodes as well, which raise the issue of hierarchical agglomeration of variables.

Here we do not deal with several phenomenologically distinct levels of description, and the objects we’re concerned with remain the same even when the normalization conditions evolve.

One can take advantage of the combinatorial nature of relations defined on a finite set of ver- tices to elaborate hierarchical multilevel approaches [?, 5].

5In the case of a Erd¨os-R´enyi random graph,n=N−1whereNis the number of nodes, whilepis the probability for two nodes to be connected, cf [?] p.56

(9)

5 n-ary relations

In this section, we extends the framework presented so far ton-ary relations. The underlying idea is to identify a structure of relationships and a topological space. But this cannot be achieved directly: correlation is inappropriate to ground an-ary relationship. Then, to meet computational requirements we need to take advantage of algebraic topology that allows algo- rithmic processing. To take into account the multilevel stand put forward in previous section, we introduce filtrations. Lastly we give experimental results.

5.1 Causality inn-ary relations

Let us enumerate some possibilities offered by statistics to express n-ary relation. (??) is limited by the arity of the correlation function: the predominance of second-order moments is a consequence of the prevalence of the Gaussian distribution in models if not in nature.

Indeed:

A. the Gaussian distribution is completely described by its first two moments.

B. instead of describing an unknown distribution, it may seem more natural to first compare it to the normal law and to provide some distance from it.

One possibility would be to define an-ary relation based on binary relations, e.g.(X1, . . . ,Xn) are related if each couple(Xi,Xj)is related⁶, but in many cases pairwise relations say nothing about relations between, say3variables.

Example 3 (Pairewise independent variables). LetX1andX2two independent rv with values in{0,1}, with a probability¹₂, and the rvX3 =X1X2+(1−X1)(1−X3).X3has also values in{0,1}with a probability¹₂ becauseP(X3=0)=P(X1 =0)P(X2 =1)+P(X1=1)P(X2= 0)= ¹₂andP(X3=1)=1−P(X3=0)= ¹₂.X1,X2,X3are pairewise independent since:

P(X1=0,X3=0)=P(X1=0)P(X2=1)= ¹₄ =P(X1=0)P(X3=0) P(X1=0,X3=1)=P(X1=0)P(X2=0)= ¹₄ =P(X1=0)P(X3=1) P(X1=1,X3=0)=P(X1=1)P(X2=0)= ¹₄ =P(X1=1)P(X3=0) P(X1=0,X3=1)=P(X1=1)P(X2=1)= ¹₄ =P(X1=1)P(X3=1) Analog equalities can be found forX2,X3. However we have the relation

P(X1=0,X2=0,X3=0)=0,P(X1=0)P(X2=0)P(X3=0)

HenceX1,X2,X3are not independent.

Example 4 (Case of three pairewise independent rv). Consider a random vectorX=(X1,X2,X3) uniformly distributed onto the tetraedron whose vertices are the points{(0,0,0),(0,1,1),(1,0,1),(1,1,0)}

with the pdf f(x)=1

2111[0,1](x1)111[0,1](x2)111[0,1](x3)[δ(x1+x2−x3)+δ(x2+x3−x1)+δ(x1+x2+x3−2)], (7) where111[0,1] is the indicator function of the interval[0,1]. Any coordonnate of this vector is uniformly distributed in the interval[0,1]and its projection onto the planx1+x2+x3=0is uniformly distributed inside[0,1]². Hence, variablesX1,X2etX3are pairewise independent.

However they are dependent because otehrwise the distribution ofXwould be uniform inside

the cube[0,1]³.

6this is similar to the use of Rips complex instead of ˇCech complex in computational topol- ogy, see [?] for definitions

(10)

A fundamental result of Information Theory is that a Gaussian variable has the largest entropy among all random variables of equal variance [9], in other word the Gaussian distribution is the “most random” or the least structured of all distributions. This means that entropy could be used as a measure of nongaussiannity.

A statistical relationship, however strong and suggestive, can never establish a causal connec- tion: ideas on causation must come from outside statistics. For instance, we may be interested in whether there is a relationship between an alarm and an earthquake: put this way it is a problem of interdependence. But if we are interested in detecting the alarm to convey information about the earthquake, we are considering the dependence of the latter upon the former.

This is clearly an asymmetrical relation: earthquake ’causes’ alarm to activate, but we are cer- tain that alarm do not affect the earthquake, so we measure the dependence of alarm upon earthquake. Even if they were in perfect functional correspondence, we cannot reverse the

“obvious” causal connection.

At this stage, we ought to define what we mean by cause. We shall content of the following definition:xis a cause ofyif and only if the value ofycan be changed by manipulating only x. The issue of causality cannot be overlooked and the result of a statistical investigation is in support of a causal relationship. In regresson analysis, it is reasonnable to admit that changes in the dependent (or response) variables are caused by the changes in the inputs. The notion of conditional independence has an important role to play in disentangling relationships between variables. Rubin [10] provides a framework for causal inference. Granger [8] describes a form of causality based on time ordering of the variables.

5.2 Organization of measured relationships

In this section, we devise tools capable to organize the interdependence relationships between a number of variables or in the dependence of one or more variables upon others. Suppose that we have agreed to select an type of operatorRǫ(. . .)that acceptskarguments (2<k≤n) – the vertices{v1, . . . ,vn}–. We look for order and regularity in subset of vertices that are related according toRǫ, for instance the following sets

Rǫ(v1,v2) Rǫ(v2,v4,v5) Rǫ(v2,vi,v_n−1,vn)

...

In the same way that a set of pairs of vertices{(vi,vj)|i∈I,j∈J}defines a graph, the previous set of subsets defines a hypergraph, that extends graphs to larger dimensions. By analogy with ideas put forward in Knowledge Representation where topologial properties of relation- induced graphs are studied – i.e. number of connected components, graph connectivity⁷– we consider that set of subsets as a topological space and borrow relevant tools to examine it.

By “examining” we mean the search of an invariant that maps the same element to spaces that share the same topology. Invariants are often used via contrapositives: when two topological spaces have different invariants, their types differ. Nevertheless if the invariant is the same, it might have an insufficient discriminating power, and it is not guaranteed that the two spaces really are of the same topological type.

Simplicial homology

Simplicial homology theory provides us with such invariants, as will be examplified by section 5.3. First, let us set some landmarks about simplicial homology and related fields [?]. The main idea here is to compare different spaces, to decide whether or not they are equivalent from the point of view of topology, and finally to constitute equivalence classes. Of course the acceptions of “equivalence” are manifold:

7the problem of graph connectivity is determining the smallest subset of vertices (or edges) whose deletion would disconnect the graph.

(11)

a. homeomorphy: letXandYbe two topological spaces. If there exist a continuous and bijective map f :X→Ysuch that f⁻¹is continuous thenXandYare said to be homeo- morphic, and have the same topological type.

b. homotopy: the formal statement being counter intuitive, we settle for the following:X andYare homotopy-equivalent if they can be transformed into one another by bending, shrinking and expansion.

c. homology: instead of working directly on spaces thanks to a map defined between them, homology introduces intermediate algebraic structures that correspond to the topological spaces (e.g. group structures in that case), so that from those algebraic structures, invariants can be built and compared as mentioned earlier.

d. simplicial homology: this form of homology is defined in a combinatorial setting (i.e. when the set of points that form the space is countable), more precisely for a particular type of topological space -namely simplicial complexes- that add constraints to the hypergraph structure.

Zomorodian [?] compares these different notions borrowed from topology and algebraic topol- ogy, on the basis of their computational tractability and focusses on simplicial homology.

Simplicial complexes

Now the price to pay for casting topological features of spaces in computational terms is to restrict the scope of possible topological spaces to simplicial complexes, that may be defined grossly as a countable set of verticesV ={vi}i∈I, and a set of simplices that intersect along their faces. There are important additional requirements, but instead of giving an axiomatic presentation (see [?]), we state definitions in a more intuitive way:

• every simplex is constituted of faces. For example, the2-simplexS={v1,v2,v3}has the following simplices the following faces, as illustrated by Fig. 7(a) in the case of a triangle:

{{v1},{v2},{v3},{v1,v2},{v2,v3},{v1,v3},{v1,v2,v3}}

• C1: if a simplexsbelongs to a simplicial complexKthen all its faces belong toK.

• C2: intersections in a simplicial complex must occur along shared faces, as shown by Fig.

7(b).

So far we’ve first justified the choice of simplicial homology, before stating the condition to be met by the topological space under study. In section 5.3 we give a characterization method derived from the homological framework just depicted, then in section 5.4 we draw the consequences, in statistical terms, of constraintC1discussed above.

v1

v2

v3

s1

s2

s3

s

(a) Faces (b) Intersections

Fig. 7. (a) 2-simplex s = {v1,v2,v3}, and faces {{v1},{v2},{v3},{v1,v2},{v2,v3},{v1,v3}, {v1,v2,v3}}. (b) simplicial complex with (left) allowed (right) forbidden intersections.

(12)

Filtrations

A filtration is a growing sequence of simplicial subcomplexes of a complexK, as shown by Fig. 8. One way to describe it is to imagine a map from a continuous scalar space such as[0,1]

toK: for each parameter valuetwe get a subcomplex ofK, and astincreases contiuously from 0to1we first obtain an empty subcomplex to finally get the full complexK.

(a)t=0 (b)t=0.5 (c)t=1

Fig. 8. Filtration at different parameter levels.

This structure is made necessary to take into account the multilevel structure that supposes to organizes the relations computed simultaneously at several threshold levels.

We review in section 5.3 some computational characteristics of simplicial complexes taken from the field of computational topology that first allow to compute invariants for simplicial complexes, then for filtrations.

5.3 Characterization of simplicial complexes

Section 5.2 precises the way to identify a set ofn-ary relations with a topological space. Here, we aim at deriving computable characteristics of those spaces.

In experiments not reported in this article we first intended to characterize simplicial complexes in a quite naive way, computing the relative proportion ofk-simplexes in the complex, for several threshold values; however this approach displayed little discriminative power. The second idea which turned out not to be pertinent was generalizing the idea of degree distribution for each simplex order: what is the probability for a vertex to be simultaneously part of exactlyk00-simplex,k11-simplex, etc but that would involved ann-dimensional probability distributions (depending on the maximum order allowed by then-ary relation); parametrized by the threshold level, so we take advantage of the framework, where the set of relations is as- similated to a particular type of topological space before being characterized using simplicial homology theory.

In section 5.2, the framework relies on associating a group structure to each simplicial complex. Ggiving the details of that structures is far beyond the scope of this article. Suppose we deal with a simplicial complex in dimension3; a way to characterize it is to count the number of voids enclosed inside the complex, and the number of tunnels that go through the space.

Fig. 9 illustrates this with two examples: an empty sphere and a torus. Intuitively, finding that these spaces are of different topological types seems obvious since one cannot be deformed continuously one into the other; the homological way to state this is to note that the sphere encloses a void space, as does the torus, however there is a “tunnel” going through the torus, not through the sphere.

In the topological litterature, the Betti numbers of orderkβkencode those invariant properties of spaces:

• β0can be interpreted⁸as the number of connected components in the simplicial complex.

8in dimension3for torsion-free spaces

(13)

Fig. 9. Empty sphere and torus.

• β1is the number of tunnels enclosed by the space.

• β₂is the number of voids enclosed by the space.

Now recall from section 4 that we’we adopted a multilevel stand, to cope with the parametriza- tion of relations. Thus instead of organizing a set of relations at a given threshold level, we take into account simultaneously several levels and build a filtration, as mentioned in section 5.2. The last step is thus to adapt the characterization of a simplicial complex in the case of a filtration. This was achieved by Edelsbrunner et al. in [?], and lead to the notion of persistent homology, that captures long living Betti number when the continuous value that parametrizes the simplicial complex in the filtration is varied.

As this will be examplified in section 5.5, we now turn to ensuring compatibility between con- straintC1imposed by the structure of simplicial complexes in 5.2, and the statistical grounding of relations as in 2.

5.4 From correlatedness to independence

As already suggested, independence is a much stronger property than uncorrelatedness. This can be stated by saying independence implies nonlinear uncorrelatedness. Ifx1andx2are independent rv, then any nonlinear transformationsg(x1)andh(x2)are uncorrelated. Mathemat- ically, statistical independence is defined in terms of probability densities [9]. For simplicity, Xis independent ofYif knowing the value ofYdoes not give any information on the values ofX. In words, the joint densitypX,Y(X,Y)must factorize into the product of their marginal densitiespX(X)andpY(Y). Uncorrelated Gaussian rv are also independent, a property which is not shared by other distributions in general.

Mutual information is a measure of the information that a set of rv have on the other rv in the set. Using entropy, we can define the mutual informationIbetweennrvX1, . . .Xn, as follows

I(x1, . . . ,xn)= Xn

i=1

H(xi)−H(x), (8) wherexis the vector containing all thexi. Mutual information can be interpreted by using the interpretation of entropy as code length. The termsH(xi)give the lengths of code for thexi

when these are coded separatedly, andH(x)gives the code length whenxis coded as a ran- dom vector, i.e. all the components are coded in the same code. Mutual information thus shows what code length reduction is obtained by coding the whole vector instead of the separate components. In general, better codes can be obtained by coding the whole vector. However if the xiare independent, they give no information on each other.

Alternatively, mutual information can be interpreted as a distance between two probability densities, because, as the Kullback-Leibler divergence, it is always non negative and zero iif the two distributions are equal. Thus one might measure the independence of theXias the mutual information between the real densityp^X(x)and the factorized densitypX₁(x1). . .pXn(xn).

Moreover, any extracted subsequence of variables forms an independent set of variables.

If the space of multidimensional densities comes equipped with a metric structure then from (3), then-ary relation based on ’approximate’ independence can be stated as follows:

Rǫ(X1, . . . ,Xn) ⇐⇒ I(x1, . . . ,xn)< ǫ. (9)

(14)

since (9) respects the following conditions:

(i) the variable arity:Rǫ(X1, . . . ,Xk)is well-defined fork≤n.

(ii) for all subsequence(Xi, . . . ,Xj)⊆(X1, . . . ,Xk), thenRǫ(X1, . . . ,Xk)⇒Rǫ(Xi, . . . ,Xj).

(iii) computational tractability.

5.5 Experimental results

The mutual information is a function of densities. This makes the problem much more complicated because the estimation of densities is, in general, a nonparametric problem. Nonparame- tric means that it cannot be reduced to the estimation of a finite parameter set. Nonparametric estimation of densities is known to be a difficult problem. One way to solve the problem of density estimation is to approximate the densities of the components by a family of densities that are specified by a finite number of parameters⁹. For instance, we consider the following log-densities:

logp⁺(x)=α1−2 log cosh(x) (10)

logp⁻(x)=α2−[x²

2 −log cosh(x)] (11)

whereα1, α2are positive parameters that are fixed to makep⁻andp⁺probability densities.p⁻ is subgaussian whereasp⁺is supergaussian.

Densities could be estimated using basic density estimation methods such as kernel estimators:

such a simple approach would be very error prone, however, because the estimator would depend on the correct choice of the kernel parameters, greedy of samples, computationally rather complicated for a large number of dimensions [?].

The validity of the approach can be found in the detail of the experiments discussed below, which were carried on using this method¹⁰. At this stage, it is therefore licit to add to the simplex whose vertices correspond to the rvX1, . . . ,Xithe simplicial complexC. It would then be a simple matter of iteration to achieve the incremental building of the simplicial complex.

−1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1

−1

−0.8

−0.6

−0.4

−0.2 0 0.2 0.4 0.6 0.8 1

(a)ǫ=0.3

−1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1

−1

−0.8

−0.6

−0.4

−0.2 0 0.2 0.4 0.6 0.8 1

(b)ǫ=0.5 (c)ǫ=0.9

Fig. 10. Simplicial complexes extracted from the filtration for different threshold values, lim- ited to2-simplices.

Instead of iterating the building process depicted above for each distance threshold, we store directly the distances for each possible combination of(X1, . . . ,Xi)⊆(X1, . . . ,Xn). These distances play the role of birth dates necessary to specify the filtration structure. Consequently, we get the simplicial complex corresponding to a given threshold value by just extracting it from the filtration, as illustrated by Figure 10 for a series of arbitrary threshold values, at a

9Classical approximations by cumulants, –e.g. Edgeworth expansion when Gaussian distribution is assumed – is computationally very difficult.

10Source code made available by [?].