Statistical metrics for the characterization of karst network geometry and topology

(1)

Statistical metrics for the characterization of karst network geometry

and topology

Pauline Collon

a,

*

, David Bernasconi

b

, Cécile Vuilleumier

a,b

, Philippe Renard

b

a_{GeoRessources UMR7359, Université de Lorraine, CNRS, CREGU, ENSG, 2 rue du doyen Marcel Roubault, TSA 70605, Vandœuvre-lès-Nancy Cedex 54518, France} b_{Centre of Hydrogeology and Geothermics, University of Neuchâtel, 11 Rue Emile Argand, Neuchâtel CH-2000, Switzerland}

Keywords: Karst characterization Topology Geometry Graph theory Database Karst pattern

A B S T R A C T

Statistical metrics can be used to analyse the morphology of natural or simulated karst systems; they allow describing, comparing, and quantifying their geometry and topology. In this paper, we present and dis-cuss a set of such metrics. We study their properties and their usefulness based on a set of more than 30 karstic networks mapped by speleologists. The data set includes some of the largest explored cave systems in the world and represents a broad range of geological and speleogenetic conditions allowing us to test the proposed metrics, their variability, and their usefulness for the discrimination of different morphologies. All the proposed metrics require that the topographical survey of the caves are ﬁrst converted to graphs consisting of vertices and edges. This data preprocessing includes several quality check operations and some corrections to ensure that the karst is represented as accurately as possible. The statistical parame-ters relating to the geometry of the system are then directly computed on the graphs, while the topological parameters are computed on a reduced version of the network focusing only on its structure.

Among the tested metrics, we include some that were previously proposed such as tortuosity or the Howard’s coefficients. We also investigate the possibility to use new metrics derived from graph theory. In total, 21 metrics are introduced, discussed in detail, and compared on the basis of our data set. This work shows that orientation analysis and, in particular, the entropy of the orientation data can help to detect the existence of inception features. The statistics on branch length are useful to describe the extension of the conduits within the network. Rather surprisingly, the tortuosity does not vary very significantly. It could be heavily influenced by the survey methodology. The degree of interconnectivity of the network, related to the presence of maze patterns, can be measured using different metrics such as the Howard’s parameters, global cyclic coefficient, or the average vertex degree. The average vertex degree of the reduced graph proved to be the most useful as it is simple to compute, it discriminates properly the interconnected systems (mazes) from the acyclic ones (tree-like structures), and it permits us to classify the acyclic systems as a function of the total number of branches. This topological information is completed by three parameters, allowing us to refine the description. The correlation of vertex degree is rather simple to obtain. It is systematically positive on all studied data sets indicating a predominance of assortative networks among karst systems. The average shortest path length is related to the transport efficiency. It is shown to be mainly correlated to the size of the network. Finally, central point dominance allows us to identify the presence of a centralized organization.

1. Introduction

Several studies have shown the importance of karst network geometry for understanding ﬂow and transport in karstic aquifers

*Corresponding author.

E-mail addresses:pauline.collon@univ-lorraine.fr(P. Collon),

bernasconi.david@gmail.com(D. Bernasconi),cecile.vuilleumier@unine.ch

(C. Vuilleumier),philippe.renard@unine.ch(P. Renard).

(Jeannin, 2001; Chen and Goldscheider, 2014). But in many cases, the network geometry remains partially, or even totally, unknown. Various simulation methods have thus been developed to tackle this problem. They can be conditioned to ﬁeld observations and to a partial knowledge of the network that drive orientations and/or length of the conduits, but they hardly consider the global net-work architecture (e.g.,Jaquet et al., 2004; Labourdette et al., 2007; Borghi et al., 2012; Collon-Drouaillet et al., 2012; Pardo-Iguzquiza et al., 2012; Viseur et al., 2014; Hendrick and Renard, 2016). Metrics that would characterize karstic networks are thus required (i) to

(2)

compare artiﬁcial and natural networks, and possibly, (ii) to better parameterize the simulation methods.

First attempts to quantitatively characterize karst networks date back to the 1960s (e.g.,Curl, 1966; Howard, 1971; Williams and Williams, 1972) and was strongly supported by parallel investiga-tions on river networks (e.g.,Horton, 1945; Scheidegger, 1966; Schei-degger, 1967; Woldenberg, 1966; Smart, 1969; Howard et al., 1970). Systematic analysis was then mainly done on two-dimensional pla-nar maps or vertical cross sections produced by speleological explo-rations (Howard, 1971).

But over the last decade, new exploration and survey tools have emerged that ease the acquisition, storage, and share of three-dimensional data. LIDAR techniques and aerial photography now allow us to rapidly map surface evidence of karst presence (Weishampel et al., 2011; Alexander et al., 2013; Zhu et al., 2014). Lidar technology is also used underground and allows now 3D mor-phological analysis of small portions of conduits and drains (Ployon et al., 2011; Jaillet et al., 2011; Sadier, 2013). Underground-GPS sys-tems progressively develop (Caverne, 2011), and one can hope that they will therefore facilitate the acquisition of 3D regular data sets on karstic systems, as well as the optical laser device tool recently tested in Yucatan (Mexico: Schiller and Renard, 2016). Combined with the increasing power of computers, these recent advances cre-ate a renewal on statistical analysis of karsts, considering now the 3D nature of these systems (Pardo-Iguzquiza et al., 2011; Piccini, 2011; Fournillon et al., 2012). But in general, these studies focus on the geo-metrical characterization of the networks, and they rarely compute metrics on more than one or two examples.

In parallel, topological analysis of networks has had an explosive growth, and many new metrics have been proposed that have not yet been applied on natural networks such as karstic systems (Ravasz and Barabasi, 2002; Boccaletti et al., 2006; Costa et al., 2007).

The goals of this paper are (i) to propose a set of metrics to char-acterize both the geometry and the topology of karstic networks and (ii) to provide a data set of the corresponding values computed on a large ensemble of karstic systems.

With these objectives, we introduce several metrics from graph theory that have not yet been used, to our knowledge, in this context. We compare them with other metrics introduced by previous authors. For each metric, we provide, in addition to its formal deﬁnition, some examples on simple networks to help in getting an intuitive under-standing of their meaning. We then compute all the metrics on a data set of 34 cave systems gathered thanks to the help of speleologists: 31 networks are real 3D data sets, while 3 come from 2D projec-tion maps ofPalmer (1991). The analysis of these results permit us to compare the metrics, analyse potential correlations, and discuss their relevance for the quantiﬁcation of karst geomorphology. 2. From real networks to graphs

2.1. Data acquisition

A solid statistical analysis would require a data set as large as pos-sible of precise, complete, regular, and homogeneous measurements. Cave mapping is technically diﬃcult and performed thanks to long-term work of trained speleologists. To our knowledge, no centralized database inventories all explored caves and gives open-access to the primary 3D data. To realize this study, various speleologists have been independently contacted and have agreed to share their data. Thus, we collected 31 three-dimensional karst networks from vari-ous locations in the world (Table 1). Three 2D networks were also used in the study and complete the database: Blue Spring, Crevice, and Crossroads (USA). Note that four parts of the Sieben Hengste karst (Switzerland) were provided and studied. Subparts SP1 and SP2 are included in the LargePart network. The UpPart is an independent one, located upstream from the LargePart network. We have not split

nor merged any network parts as we had no accurate information indicating if it should be done and how.

The total explored length of these networks strongly varies: from small networks like Pic du Jer in France, with a total length of around 612 m, to very large ones like Ox Bel Ha in Mexico, with a total explored length of 143 km. The variety of morphologies of the gath-ered networks are also interesting to notice. Some are principally developed around some horizons providing them a close to 2D archi-tecture (e.g., Agen Allwed, Foussoubie Goule Ox Bel Ha), others are characterized by their vertical elongation (e.g., Krubera and Ratasse), and some have developed equally in the three dimensions of space (e.g., Mammuthöhle, Sakany and Sieben Hensgte SP1:Fig. 1).

The three 2D networks come from an automated digitalization of 2D maps published byPalmer (1991). Despite that the data format changes from one source to another, all 3D karst networks mapped by speleologists are available as a sequence of n topographic sta-tions i = 1, 2,

. . .

, n referring to a given origin point i = 1 (e.g., an entrance of the considered cave:Fig. 2A). In general, cave survey data consist in such series of uniquely defined stations linked to each other by lines-of-sight (e.g.,Jeannin et al., 2007; Pardo-Iguzquiza et al., 2011). Surveying methodology varies, but, in general, one line-of-sight, cij, linking two consecutive stations i and j, is defined by (i) a distance measured with a low-stretch tape or laser range-finder, (ii) a direction (azimuth or bearing) taken with a compass, and (iii) an inclination from horizontal taken with a clinometer (Fig. 2B). Most of the time, but not always, the maximum height and the maximal width of the passage are also measured at each station. Sometimes, it is a more detailed distance to the surrounding walls that is pro-vided through left, right, up, and down measurements (e.g.,Jaillet et al., 2011; Rongier et al., 2014).

Despite the recent efforts of the speleological community to homogenise their cave survey methodology and take increasing care of the precision and validity of the ﬁeld measurements, some survey errors can still appear and are more or less easily detectable:

• Back-sight measures: a station can be measured twice when the surveyor returns in the opposite direction but continues its data acquisition. This can be easily detected when the station is strictly identical, but if the measurement is made just nearby, it can be interpreted as a cycle – also called passage loop (Fig. 3- case A).

• Cycle closure errors appear when a gap, even small, is observed between the first and the last stations of a cycle. This happens especially when cumulative inaccuracies are registered during the cycle survey without a final rectification by speleologists. This can be detected by a neighbourhood distance scanning around each point (Fig. 3- case B).

• Missing connections are similar to cycle closure errors. This division of a junction survey station into two points separated by a small gap appears at junctions, notably when the explo-rations of the joining conduits were performed at different times and/or by different teams.

Moreover, survey stations are chosen by speleologists for their ease of access and clear sight along the cave passage. This choice has several consequences:

• Karstic networks are not regularly sampled. Additional samples can affect the apparent topology (Fig. 3- case F) of the net-work. Some small meandering of the conduits could be ignored during the acquisition if a lower sampling resolution is chosen (Fig. 3- case C).

• Stations are not located on the mathematical central line of the conduit, i.e., they are not located at the middle of the con-duit section. Placing the stations along the concon-duit walls can emphasize the conduit apparent sinuosity (Fig. 3- case C).

(3)

Table 1

Karst dataset: L is the total mapped length (in kilometers);DZ is the vertical extension of the system (in meters).

Name Location L (km) DZ (m) Name Location L (km) DZ (m)

AgenAllwed South Wales, UK 13.7 122 Krubera Georgia 13.2 2191

ArphidiaRobinet France 13.5 634 Lechuguilla New Mexico, USA 329 1257

Arrestelia France 60.9 827 Llangattwg South Wales, UK 0.90 30

BlueSpring (2D) Tennessee, USA 8.00 – Mammuthöhle Austria 63.9 1202

Ceberi France 7.20 310 Monachou France 0.70 33

CharentaisHeche France 13.5 433 OjoDelAgua Cuba 12.3 91

ClydachGorge Caf1 South Wales, UK 2.30 112 OxBelHa Mexico 143 29

ClydachGorge OgofCapel South Wales, UK 0.80 30 PicDuJer France 0.60 67

Crevice (2D) Missouri, USA 4.90 – Ratasse France 3.60 445

Crossroads (2D) Virginia, USA 7.80 – SaintMarcel France 56.5 275

DarenCilau South Wales, UK 20.2 186 Sakany France 7.50 141

EglwysFaen South Wales, UK 1.40 19 Shuanghe Chine 130 593

FoussoubieEvent France 2.60 130 SiebenHengsteLargePart Switzerland 82.2 988

FoussoubieGoule France 20.2 129 SiebenHengsteUpPart Switzerland 3.30 131

GenieBraque France 2.70 229 SiebenHengsteSP1 Switzerland 7.50 286

GrottesDuRoy France 3.80 330 SiebenHengsteSP2 Switzerland 6.80 239

HanSurLesse Belgium 9.80 30 Wakulla Florida, USA 17.8 93

• Several stations can be set up in one single big cave in order to get a correct representation of its scale and to better map it. The result is either a cycle or some sun-ray shape of what should have been only registered as a point (Fig. 3- case D).

• No genetic consideration is done for the deﬁnition of a line-of-sight, thus, owing to a change of the genetic phase, the form of the conduit can change radically between two survey stations without being exhaustively recorded and located.

• Finally, we cannot be sure of the completeness of the data: some conduits are not mapped because (i) they have not been explored yet, and/or (ii) they are not accessible, being too small or drowned on long distances (Fig. 3- case E).

These approximations, of no consequence for speleological explo-ration, can affect the results of a systematic shape analysis. Two pre-processing steps have been performed to limit the survey errors. First, we have implemented a tool to correct the most common cases of missing connections. If an extremity vertex is closer than a user-defined tolerance distance to a neighbouring vertex or segment, it creates a new link between them. The case of two intermediate segments close to each other in 3D, and thus, possibly defining a junction, has been ignored: at this stage, we suppose that cross-roads are important enough for speleologists to ensure that a station would have been defined on each real junction. This assumption has been verified on the 34 cave survey data we treated. Second, to deal with additional cycles that may have been generated by the previous treatment, we suppressed cycles smaller than a user-defined toler-ance sphere. These corrections do not cover all the cases mentioned above, but deals with the most automatically detectable errors.

2.2. Karst networks as graphs

Karst networks are considered in the analysis as a mathemati-cal graph. This consideration is not new, asHoward (1971)already proposed such representation of natural systems to quantitatively analyse what was represented on planar maps, and several other authors have performed equally since (e.g., Glennon and Groves, 2002; Glennon, 2001; Pardo-Iguzquiza et al., 2011; Piccini, 2011).

From the preprocessed networks, two different representations are used for the statistical analysis. For geometrical analysis, which requires real distance computations, the complete networks are directly used. But for topological parameters, a reducedrepresentation has been deﬁned that fastens the computations.

2.2.1. Complete graph

In the complete graph of a karst, an edge ci,j, or link, represents a line-of-sight between two stations that correspond to graph vertices

i and j (Fig. 4A–B). Each vertex, or node, is characterized by its Carte-sian coordinates i ={xi, yi, zi}. The initial data in distance, direction, and inclination line-of-sight successions from an origin point have been converted to{x, y, z} coordinates to end with a network com-posed of a set of n vertices and s edges connecting them. In the karst perspective, the direction of a segment may intuitively be associated with the direction of the ﬂow in the conduit. But this direction of ﬂow is sometimes ambiguous, notably because of the presence of cycles in the karst network. Thus, the graph is undirected.

The degree kiof a vertex is defined as the number of edges that are linked to this vertex. The first neighbours of a vertex i are the vertices that can be reached from i following one unique edge. Depending on the degree of a vertex we define (Fig. 4):

• the extremity nodes as the vertices with ki= 1; and

• the internal nodes as the vertices with ki

>

1; among them, the

junction nodes are the vertices with ki

>

2.

A branch is deﬁned as a set of adjacent edges connecting ver-tices of degree 2. They represent the portion of the curve connecting an extremity or a junction node with another junction or extremity node (Fig. 4B).

2.2.2. Reduced graph

The number of internal nodes of degree k = 2 in a topographical survey is largely a function of the ﬁeld conditions and the speleol-ogist’s sampling preferences. These vertices do not give information about the topology of the karstic network. To simplify and speed up the topological analysis of the network, we deﬁned a reduced graph. The reduction process consists in removing all vertices of degree 2 from the network (Figs. 4C and5A). Thus, in the reduced graph, only junction and extremity nodes are kept, gathered in the term of

seed vertices. Two seed vertices are then linked by an edge of unit

length that replaces the branch of the corresponding explored path (Fig. 4C). The reduced graph is thus composed of N seed vertices and

S edges linking them.

This process, while straightforward, takes also into account spe-cial cases. Indeed, when two different conduits link the same two seed vertices, forming a cycle, the cyclic structure has to be kept in the reduced network. To do so, one of the intermediate vertices is kept in each conduit (Fig. 5B: cycle 1). Another special situation

(4)

Fig. 1. The data set: 31 three-dimensional cave survey data have been gathered thanks to the help of various speleologists who agreed to share them. Three 2D networks were also used in the study and complete the database: Blue Spring, Crevice, and Crossroads. Four parts of the Sieben Hengste karst are also studied: SP1 and SP2 are subparts also included in the LargePart network; the UpPart is an independent one, not considered in the LargePart network. A relative altitude scale is indicated to ease the perception of the third dimension. The total cave survey length is also indicated. But for data protection reasons, we are not allowed to give more precise location, and scale of the networks.

(5)

A

B

Fig. 2. Example of a cave map (plan view) and associated survey data. (A) 27 mea-surement stations i = 1, 2,. . ., n linked by 28 lines-of-sight ci,jconstitute the karst network. (B) Plan and side views: deﬁnition of length, azimuth, and inclination along a line-of-sight. Cross section: in the best case four measures (up, down, left, right) are made; more often, only width and height are recorded.

comes from the cycles in the network. Cycles are deﬁned as con-duits starting and ending at the same seed vertex, without crossing any other seed vertex. In order to preserve the cyclic structure in

Fig. 3. Errors linked to the data acquisition process. Cases A and B can be automatically detected and corrected when processing the data. (A) Additional point and line-of-sight result from a back-line-of-sight measurement; the green point and segment should not have been recorded. (B) A small gap appears between the ﬁrst and last point of the cycle, resulting in a cycle closure error. C and C_{- impact of the sampling location on} the apparent sinuosity of the explored conduit: a regular sampling and systematically positioning of the station in the middle of the conduit could limit it. (D) Sun-ray conﬁguration resulting from the strategy chosen to sample one big cave. (E) Some parts of the network remain unexplored owing to accessibility criteria. (F) The addition of a new station can change the local topology of the network.

A

B

C

D

Fig. 4. From ﬁeld data to graphs. (A) Map view showing the karst network and the survey stations. (B) Corresponding graph with 5 external nodes and 23 internal nodes, among which 6 are junction nodes; 28 edges are gathered into 12 branches. (C) Reduced graph: only the seed vertices are kept to obtain a topological simpliﬁed representation of the network. (D) Undirected adjacency matrix representation of the reduced graph.

the reduced network, two intermediate nodes are also kept in the looping conduit (Fig. 5B: cycle 2).

2.2.3. Numerical representations

Two representations are used for the networks. The position-links

or node-links representation corresponds to geometrical description

of the network. It consists in two matrices: a matrix [X Y Z] storing

the positions of the vertices and a matrix [i j] of edges between the

vertices. This representation can be visualized in 3D in the Gocad

geomodelling software or in Matlab, for whom speciﬁc functions

(6)

Fig. 5. From ﬁeld data to graphs: (A) Long linear caves; (B) special cases where nodes of degree 2 are kept in the reduced graph to preserve the topology.

each node is known, the position matrix can be extended with an additional column D leading to the new matrix [X Y Z D].

To quickly and easily compute topological parameters, an adja-cency matrix representation is also used (Fig. 4D). It is a square matrix Ad, where each element wijexpresses the existence (wij= 1) or absence (wij = 0) of an edge from vertex i to vertex j. As we consider undirected networks, the adjacency matrix is symmetric as each edge from i to j is also an edge from j to i. In this work, we do not allow self-connecting vertices, i.e., vertices that have a link to themselves, and thus w(i, i) = 0∀i. Using the adjacency matrix, one can easily compute some statistics on the networks. The degree of each vertex is for instance the row-sum or the column-sum of the adjacency matrix.

3. Metrics to characterize network geometries

We propose here several metrics to characterize the karstic net-work geometries. They are all calculated on the complete graph of networks stored in a position-link representation. For each metric, we discuss the results obtained on the 31 three-dimensional net-works of our database.

3.1. Conduit orientation

Several geological features constitute natural drains for under-ground fluids and thus strongly influence the development of karstic conduits. Fractures count among those main features (Palmer, 1991). Fractures are generally organized into families of particular orienta-tion depending on the regional stress field (e.g.,Billaux et al., 1989; Zoback, 1992; Beekman et al., 2000). As a result, karstic networks that are mainly developed along the prominent fractures will show a

network pattern, i.e., an angular grid of intersecting passages (Palmer, 1991).

To analyse the 3D network geometry, it is thus relevant and clas-sical to compute the conduit orientations (e.g.,Kiraly et al., 1971; Ford and Williams, 2007; Pardo-Iguzquiza et al., 2011). In 3D, edges are assumed to point downward. Orientation of an edge consists in measuring its azimuth (angle between the north and the segment, ranging from 0 to 360◦) and its dip (angle with the horizontal plane, ranging from 0 to 90◦). Resulting values are stored on a 3D pointset corresponding to the midpoints of each line-of-sight (Fig. 6A and B). Typically, these data are analysed with a Rose diagram to rep-resent azimuth and dip distributions (Fig. 6C). No azimuth can be affected to vertical conduits, and edge lengths vary. Thus, to compute

the Rose diagram, each orientation value has been weighted by the length of the edge projection on the horizontal plane.

To better assess the third dimension and combine azimuth-dip analysis, we also project the results on a Schmidt’s stereonet (which preserves the areas) and draw a weighted density map (Fig. 6D and E). The density map counts the number of data points inside a cell counter that has by default a radius one tenth of the stereonet. For density map computation, to avoid bias owing to a heterogeneous sampling step, each point is weighted by the real edge length. The density map expresses so the percentage of orientation data that are contained in each 1% of the entire stereonet area. Data interpretation of the density map is done manually. We consider that preferen-tial orientations exist when localised data clusters clearly appear (density ≥ 4%). Compared to the Rose diagram, the density map is essential to identify signiﬁcant proportions of subvertical edges (Fig. 7). Families of poles can be deﬁned for which circular statistics are provided. As the three 2D networks come from initial raster data, the orientation of their segments is aligned with the background grid making them irrelevant for such an analysis.

To better quantify the differences in orientations of conduits, we proposetocomputetheentropyoforientations,HO,asrecentlydonefor urban street networks (Gudmundsson and Mohajeri, 2013). Entropy is generally assimilated to a measure of disorder (e.g.,Ziman, 1979; Journel and Deutsch, 1993). We use the Shannon entropy formula:

HO=− t i=1

pilnnbins(pi) (1)

with pi the probability of an edge (weighted by its length) to fall in the i-th bin, nbins represents the number of bins when comput-ing the histogram of the orientations and t the number of bins with nonzero probabilities. Note that the entropy calculation uses a base

nbins logarithm.

As the orientation can only take values between 0◦ and 360◦, and as we used undirected graphs, data of opposite directions are counted in the same bin and Rose diagrams are symmetric ones. This is also handled in the orientation entropy computation where we used bin widths of 10◦, thus ﬁxing the number of bins at 18. The shape of the probability distribution is thus quantitatively assessed by the entropy value that equals 0 if all edges occupy a single bin and that equals 1 if all bins have exactly the same probabilities (e.g., the distribution is uniform).

(7)

Fig. 6. Orientation analysis of the Daren Cilau karst. (A and B) Azimuth (A) and dip (B) properties stored on the pointset corresponding to the midpoints of each edge (s = 2687). (C) Rose diagram (symmetric) showing a preferential orientation of the edges along the N160◦_(=N330◦_{) direction. Frequency computation is weighted by the length of the edge} projection on the horizontal plane to take into account the heterogeneous sampling step and the absence of direction associated to vertical conduits. The corresponding orientation entropy HO= 0.921 is among the lowest measured values, expressing the existence of a preferential direction. (D) Projection of the orientations on a Schmidt’s stereonet (equal areas). (E) Density map computed on the Schmidt’s stereonet (% of data in 1% of the entire stereonet area). The density map enhances the existence of a preferred direction family N160◦–O◦dip and of the close to horizontal development of this karstic network.

On the 31 three-dimensional karst networks that we analysed, the orientation entropy varies from 0.746 (for ClydachGorge Caf1) to 0.997 (OxBelHa). These globally high values express the fact that all directions are often observable. When preferred directions exist, their relative frequency is rarely superior to 20% of all data. In the data set indeed, the ClydachGorge Caf1 karst appears as an excep-tion: it is a quite linear network, and the orientation analysis shows a maximal frequency of 32% along N150◦, which explains the entropy value HO= 0

.

794. Ranged by increasing entropy, the next network is thus Genie Braque with HO= 0

.

884 that corresponds to two main orientations, one clearly marked around the N50◦ direction. Then, the entropy of orientation increases regularly as preferred orienta-tions are less and less distinguishable (Appendix A) until values are very close to 1. For example, the karst of Lechugilla has an entropy of

HO= 0

.

996, which expresses the fact that all directions are almost equally observed in the network (Fig. 8). It corresponds to sinuous and curvilinear patterns developed when the passages are mostly influenced by almost horizontal bedding planes. Fixing a threshold value for which no preferential direction would be defined is quite difficult, but entropy of orientations appears as a good quantitative way to classify networks upon an orientation criteria. Nonetheless, caution has to be taken in the interpretation as the entropy is only computed on the horizontal projection of the orientations. Thus, a preferential subvertical orientation does not appear on the Rose dia-gram and is ignored in the orientation entropy computation. This

conﬁrms the usefulness of the complementary density map analysis. InAppendix A, the karstic networks that show a preferred subverti-cal orientation are classiﬁed separately (Fig. A.23): again from lowest orientation entropy (which is observed for networks with 2 pre-ferred orientations, a subhorizontal and the vertical one: Arphidia Robinet, HO= 0

.

918) to the highest orientation entropy (observed for Krubera, HO= 0

.

996, which is mainly vertically developed).

The variety of orientation analysis patterns observed on just 31 networks shows the variability that one can encounter when study-ing karstic systems. Most of the time, preferential orientations relate to particular inception features: tectonic (joints, fractures and faults) or stratigraphic (bedding planes:Filipponi et al., 2009). For example, in the Sieben Hengste Large Part network, a main orientation of N100◦/16◦plunge is identified complementary to a vertical one. This is consistent with the dip and direction of the main karstifiable for-mation in which the system developed (Jeannin, 1996). Also, the direction observed in the Han-sur-Lesse network (N95◦/1◦plunge) is linked to one preferential direction of fracturation observed in the field (Bonniver, 2011).

Conduit orientation is thus an interesting parameter for detect-ing the geological features of inﬂuence and for better understanddetect-ing the speleogenetic processes that have locally dominated. In this way, entropy of orientations constitutes a useful metric to quantitatively assess the existence and relative importance of preferential karstic developments.

A

B

(8)

Fig. 7. Orientation analysis of the Ratasse karst: an example of network with two preferential orientations: a subvertical one and a subhorizontal one. (A and A) Top and southern views of the Ratasse karst. (B) Rose diagram enhancing the moderate subhorizontal preferential orientation along N95–275◦_{direction, which explains the} orientation entropy HO= 0.986. (C) Density map computed on the Schmidt’s stere-onet enhancing the clear preferential vertical orientation of some of the conduits (dip close to 90◦gathers up to 20% of the edges) clearly visible on the southern view A.

3.2. Length, length entropy, and coeﬃcient of variation of the lengths

The curvilinear length of a branch li,jis measured from one seed vertex i to the following one j by adding the length of each edge it contains. The average branch length varies from 8.46 m for the Monachou cave to 331.74 m for the Clydach Gorge Caf 1 karstic system (Table 2). This said, most of the values range from 20 to 70 m with a set of only six networks showing higher values: Agen Allwed, Clydach Gorge Caf1, Genie Braque, Ox Bel Ha, Shuanghe, and Wakulla. Those networks do not correspond to a specific pattern: the first three are elongated with one main long conduit, but the three others are more branchwork ones. One could have supposed that exploration in the largest caves could have been done with a looser sampling, inducing a smaller average branch length. This is not at all the case, as no linear relationship appears with the total survey length (linear correlation coefficient r = 0

.

07).

As for orientation, we also propose to compute the Shannon entropy of the lengths Hlen to measure the variability of conduit lengths in the network: length entropy will be maximal for a uniform distribution and equal to zero if all conduits have the exact same length. In practice, contrary to orientations, lengths are not limited values. To have comparable entropy values between all networks, branch lengths are normalized by the maximum length of the net-work. Then, results are binned in 10 bins. In this study, the entropy of the lengths is therefore deﬁned as

Hlen=− t i=1

pilog10(pi) (2)

with p_ithe weighted probability of branches falling in the i-th bin, and t is the number of bins with nonzero probabilities of edges.

We also computed the coeﬃcient of variation of the lengths CVlen to characterize the dispersion of the measures. It is expressed in per-centage and corresponds to the ratio of the standard deviationslen to the mean ¯len:

CVlen=slen_¯

len ∗ 100 (3)

The length entropy Hlenvaries from 0.18 to 0.74. The coeﬃcient of variation of the lengths CVlenvaries from 0.90% to 2.56% (Table 2). No clear linear relationship is observable between both parameters, but a slight negative correlation is observed (r2 ₌ ₋₀

.

_{69, see also}

Section 5). The Charentais Heche network is a good example to better illustrate the respective meaning of both these parameters (Fig. 9). This network has the highest coeﬃcient of variation of our data set,

CVlen= 2

.

56%: branch lengths vary indeed from low (around 2 m) to high values (around 1550 m) for an average length ¯len = 50 m. But in

the case of Charentais Heche, these extrema are outliers that squeeze the data into the ﬁrst three bins (Fig. 9), inducing a low entropy of lengths, Hlen= 0

.

18. On the opposite, the Genie Braque karst is char-acterized by the highest length entropy of our data set (Hlen= 0

.

74), which expresses the more uniform distribution of the lengths (Fig. 9). In this case, the branch lengths do not vary very much around the mean, which is expressed by a very low value of CVlen= 0

.

95%.

3.3. Conduit tortuosity (or sinuosity)

Tortuosity, also called sinuosity, is a classical metric used to char-acterize karstic morphologies (Jeannin et al., 2007; Pardo-Iguzquiza et al., 2011). It is used as well to parameterize karst network

Fig. 8. Orientation analysis of the Lechuguilla karst: an example of a network with no preferential orientation. (A) Top view of the Lechuguilla karst. (B and C) Rose diagram and density map showing the absence of preferential orientation conduit, conﬁrmed by the very high value of orientation entropy HO= 0.996.

B

C

A

(9)

Table 2

Geometry measures on the 31 three-dimensional karst networks (L is the total explored length (in kilometers); HOis the entropy of orientations; n is the number of survey stations, i.e., the number of nodes of the complete graph; s is the number of line-of-sights, i.e., edges; ¯len is the average length of karst branches (in meters); Hlenis the entropy of karst branch lengths; CVlenis the coeﬃcient of variation of the branch lengths, expressed in %; ¯t is the average tortuosity of karst branches); for each parameter, minimal and maximal values are indicated in bold.

L (km) HO n s len (m)¯ Hlen CVlen(%) ¯t

AgenAllwed 13.7 0.964 1874 1893 119.99 0.46 1.84 1.54 ArphidiaRobinet 13.5 0.918 1461 1477 70.73 0.63 1.09 1.42 Arrestelia 60.9 0.907 6257 6268 69.74 0.34 1.49 1.35 Ceberi 7.2 0.972 881 886 55.61 0.50 1.35 1.27 CharentaisHeche 13.5 0.986 1985 2013 50.84 0.22 2.56 1.25 ClydachGorge Caf1 2.3 0.746 230 229 331.74 0.55 1.51 1.18 ClydachGorge OgofCapel 0.8 0.892 145 146 50.80 0.58 1.55 1.35 DarenCilau 20.2 0.921 2665 2687 55.85 0.32 1.87 1.21 EglwysFaen 1.4 0.948 307 308 15.81 0.68 1.29 1.14 FoussoubieEvent 2.6 0.928 333 339 33.20 0.50 1.61 1.27 FoussoubieGoule 20.2 0.987 2326 2354 62.30 0.66 1.41 1.21 GenieBraque 2.7 0.884 164 163 194.03 0.74 0.95 1.20 GrottesDuRoy 3.8 0.981 724 733 29.76 0.65 1.10 1.24 HanSurLesse 9.8 0.976 1668 1705 62.06 0.63 1.26 1.12 Krubera 13.2 0.996 2150 2157 55.31 0.53 1.87 1.30 Lechuguilla 329 0.996 12725 13503 68.96 0.57 0.91 1.26 Llangattwg 0.9 0.954 232 228 15.32 0.66 1.14 1.15 Mammuthöhle 63.9 0.996 9348 9712 30.99 0.25 1.29 1.42 Monachou 0.7 0.984 251 258 8.46 0.59 0.90 1.24 OjoDelAgua 12.3 0.964 1292 1326 43.81 0.53 1.43 1.21 OxBelHa 143 0.997 10098 10098 170.79 0.65 0.94 1.37 PicDuJer 0.6 0.933 102 102 25.51 0.64 1.12 1.21 Ratasse 3.6 0.986 692 693 64.86 0.44 1.59 1.41 SaintMarcel 56.5 0.993 5506 5556 75.96 0.40 1.78 1.23 Sakany 7.5 0.992 1716 1784 20.81 0.49 0.92 1.40 Shuanghe 130 0.988 7581 7634 118.76 0.46 1.28 1.22 SiebenHengsteLargePart 82.2 0.988 15340 15570 36.84 0.18 1.73 1.36 SiebenHengsteUpPart 3.3 0.978 753 753 52.46 0.72 1.04 1.36 SiebenHengsteSP1 7.5 0.991 1881 1915 26.94 0.58 1.32 1.40 SiebenHengsteSP2 6.8 0.991 1396 1418 39.52 0.64 1.21 1.43 Wakulla 17.8 0.982 474 477 330.51 0.56 1.53 1.28

simulations (Pardo-Iguzquiza et al., 2012). It is addressed at the scale of a branch. The tortuosity ti,jfrom one seed vertex i to the following one j is the ratio between the curvilinear length li,jalong the branch and the euclidean length di,jbetween the two seed vertices:

ti,j=

li,j

di,j

(4)

In order to avoid divergence for the tortuosity coefficient, we dis-card all looping path from the computation. The tortuosity coefficient of a network is defined as the mean tortuosity ¯t of all the branches in the network.

Tortuosity coefficients range from 1.12 (Han-sur-Lesse) to 1.54 (Agen Allwed:Table 2). Tortuosity values are quite difficult to inter-pret. Indeed, one could have assumed that high tortuosity coeffi-cients would reflect sinuous, curvilinear patterns of conduits like branchwork or anastomotic caves as defined byPalmer (1991). But the results seem more linked to the sampling strategy than to a clearly cave pattern (Fig. 10). This refers to the limitations of the data acquisition process, which is not guided by statistical con-siderations but by surveying constraints (Section 2.1), which also vary depending on the speleologist team. Nevertheless, this param-eter is sometimes proposed as an input paramparam-eter for stochastic simulations (Pardo-Iguzquiza et al., 2012).

4. Statistical measures of topology

To complement the geometrical analysis of karstic networks, we propose to use several parameters from graph theory to characterize

network topology. Table 3 summarizes these metrics, which are computed on the reduced graph representations. As only the topol-ogy matters here, the three 2D networks are also included in the database, allowing us to have 34 networks to analyse.

4.1. Considering cycles

In the following, tree graphs, or also called acyclic networks, refer to networks that do not have any cycles or passage loops. They correspond to branchwork patterns (Palmer, 1991). The terms

inter-connected networks or maze patterns are used to name nonacyclic

networks, i.e. networks with several passage loops. These networks regroup the anastomotic (dominated by curvilinear conduits) and network patterns (dominated by straight conduits linked to the enlargement of fractures) of Palmer’s classiﬁcation.

4.1.1. Connected components and cyclomatic number

A connected component of a graph is a subgraph such that all its nodes are reachable by all other nodes in the subgraph. Reach-ability corresponds to the existence of a path between the nodes. To compute the connected components of a cave system we use a depth-ﬁrst-search algorithm tagging each vertex with the index of the connected component to which it belongs (Fig. 11). The number of connected components corresponds to the number of subgraphs

p as deﬁned in graph theory. Our data set gathers karstic systems

mostly composed by 1 to 5 connected components (Table 4). The Mammuthöhle karst has, however, the particularity of being com-posed by 25 connected components. Showing multiple connected

(10)

Fig. 9. Charentais Heche and Genie Braque karsts: colors vary to enhance the different karst branches. On the right the corresponding length distributions are represented. The rather wide distribution of Genie Braque branch lengths is expressed through a high Hlenvalue (Hlen= 0.74), while the more concentrated distribution of Charentais Heche corresponds to a low value of Hlen= 0.18.

components means that the network is divided into several parts that are not linked one to another by a mapped karstic conduit. Two possible interpretations can explain this absence of link (i) a hydro-logical connection does exist and has been proved by tracer tests, but the corresponding conduit(s) has(ve) not been yet, or cannot be, explored or (ii) the connection does not exist.

The cyclomatic number Ncyclis a classical parameter used in graph theory. It reﬂects the number of cycles, or passage loops, inside a network, which is high for maze patterns and equals 0 for acyclic networks. It can be automatically calculated from the total number

of nodes N, the number of edges S, and the number of subgraphs p:

Ncycl = S− N + p. A large variety of cases are encountered in our database, with cyclomatic numbers ranging from 0 to 779 for the highly anastomotic Lechuguilla karst.

4.1.2. Howard parameters

Derived from graph theory (Garrison and Marble, 1962; Kansky, 1963), Howard’s parameters were developed to characterize braided patterns of streams (Howard et al., 1970). They were rapidly applied to quantify karst network connectivity (Howard, 1971). Howard’s

Fig. 10. Three top views of karstic networks of increasing tortuosity coeﬃcient: Han-sur-Lesse ¯t = 1.12, Ojo del Agua ¯t = 1.21, and Agen Allwed ¯t = 1.54. No evidence of a link between the cave pattern and the tortuosity coeﬃcient arises.

(11)

Table 3

List of metrics used to characterize the karst topology; all are computed on the reduced graphs.

Ncycl Cyclomatic number (number of passage loops) DC Connectivity degree

H Global cyclic coeﬃcient (Kim and Kim, 2005) ¯k Average vertex degree

CVk Coeﬃcient of variation of degrees

rk Correlation of vertex degrees (Newman, 2002) ¯

SPL Average shortest path length

CPD Central point dominance (Freeman, 1977)

parameters are deﬁned for planar graphs and depend on three graph attributes: the number of external nodes Next, the number of junction nodes Njunc, and the cyclomatic number Ncycl(Fig. 4).

The parametera designates the ‘ratio of the observed number of

cycles to the greatest possible number of [cycles] for a given number of nodes’ (Howard et al., 1970):

a = Ncycl

2(Njunc+ Next)− 5 (5)

The parameterb is the ratio of the number of edges to the number

of seed vertices:

b = S

Njunc+ Next

(6)

The parameterc expresses ‘the ratio of the observed number of

edges to the greatest possible number of edges for a given number of nodes’ (Howard et al., 1970), i.e., the ratio between the number of karst branches toward the maximal possible connections:

c = S

3(Njunc+ Next− 2)

(7)

Fig. 11. The 25 connected components of the Mammuthöhle network: each color represents one connected component.

It is important to emphasize again that these parameters were developed for 2D planar graphs. In our data set, most networks are 3D, and their topology cannot always be reduced to those of their 2D horizontal projection (such as Arphidia Robinet, Mammuthöhle, or Krubera).

However, even if the assumptions underlying Howard’s calcula-tions are not met, it is interesting to compute those parameters in order to compare our results with the 25 networks that Howard stud-ied.Howard (1971)indicates thata, b, and c respective values are close to 0, 1, and 0.33 for a branchwork karst; and respectively close to 0.25, 1.5, and 0.5 for a reticular one. They can be thus synthesized into the connectivity degree DC, expressed in percentage, which is expected to be close to 0 for a branchwork karst and close to 1 (or 100 if expressed in %) for a reticular karst:

DC= a 0.25+ b−1 0.5 + c−0.33 0.17 3 (8)

The results are presented inTable 4. Thea values range from 0

to 0.16,b values from 0.88 to 1.28, and c values from 0.32 to 0.44,

which is quite consistent with the reference values for karst. High values of the three parameters systematically concern the same net-works: Han-sur-Lesse, Sakany, Mammuthöhle, Sieben Hengste SP2. But for the low values the classification slightly varies depending on the considered parameters. In every case, low values correspond to obviously branchwork karsts, while the high values are associ-ated to more reticular patterns (Fig. 12). Between these extrema, all intermediate configurations (and values) are observed, and defining thresholds that would permit us to differentiate karst categories is quite difficult.

4.1.3. Cyclic coeﬃcient

The cyclic coefficient is a topological coefficient defined byKim and Kim (2005)in order to measure how cyclic a network is. The local cyclic coefficientHiof a seed vertex i is defined as the mean of the inverse of the sizes of the smallest cycles formed by vertex i and its neighbours (Fig. 13A). The mean is taken on all possible pairs of edges connected to the vertex i:

Hi= 2 ki(ki− 1) ( j,h) 1 Li jh (9) where Li

jhis the size of the smallest cycle that passes through vertices

i, j, and h. If no cycle passes through i, j, and h then Li

jh=∞ (Fig. 13A). The algorithm for the computation of the cyclic coeﬃcient for ver-tex i consists in taking its two ﬁrst neighbours (j and h) and removing the edges (i, j), (i, h) and their opposite counterparts (j, i) and (h, i) from the network. Then, the length Ljhof the shortest path between j and h is computed. The length of the cycle is then obtained by adding the removed edges Lijh= Ljh+ 2. These edges are then reintroduced in the network and the computation is redone on a second pair of neighbours of vertex i until each possible pairs are visited.

The global cyclic coefficient for a network is defined as the mean cyclic coefficient of all the seed vertices of the network:

H = 1

N

i

Hi (10)

If a network has a pure branchwork pattern, no cycle is present in the network,h = 0 (Fig. 13B). On the opposite, if all vertices of a network are connected to the other ones, the cyclic coefficient will be equal to one-third (Fig. 13C). Note that the cyclic coefficient is slightly different from the global clustering coefficient, C, defined by Watts and Strogatz (1998), that is also used to measure how well a

(12)

Table 4

Topology measures on the 34 reduced graphs of the karst networks: considering cycles (N is the number of seed vertices; S is the number of edges; p is the number of connected components; Ncyclis the cyclomatic number (number of passage loops);a, b, and c are the Howard’s parameters; DCis the connectivity degree;H is the global cyclic coeﬃcient); maximal and minimal values are enhanced in bold.

N S p Ncycl a (10−2) b c (10−2) DC(%) H (10−2) AgenAllwed 129 148 1 20 7.91 1.15 38.85 31.82 5.70 ArphidiaRobinet 180 196 2 18 5.07 1.09 36.70 19.95 4.02 Arrestelia 879 890 3 14 0.80 1.01 33.83 3.52 0.83 BlueSpring (2D) 477 503 1 27 2.85 1.05 35.30 11.93 2.00 Ceberi 132 137 4 9 3.47 1.04 35.13 11.33 3.28 CharentaisHeche 260 288 1 29 5.63 1.11 37.21 22.94 5.17 ClydachGorge Caf1 8 7 1 0 0.00 0.88 38.89 3.21 0.00

ClydachGorge Ogof Capel 19 20 1 2 6.06 1.05 39.22 23.78 5.26

Crevice (2D) 253 262 1 10 2.00 1.04 34.79 8.55 1.55 Crossroads (2D) 371 430 1 60 8.14 1.16 38.84 32.92 5.91 DarenCilau 351 373 1 23 3.30 1.06 35.63 13.73 2.19 EglwysFaen 88 89 1 2 1.17 1.01 34.50 5.25 0.61 FoussoubieEvent 75 81 1 7 4.83 1.08 36.99 19.59 3.50 FoussoubieGoule 302 330 1 29 4.84 1.09 36.67 19.83 3.79 GenieBraque 15 14 1 0 0.00 0.93 35.90 1.24 0.00 GrottesDuRoy 124 133 2 11 4.53 1.07 36.34 17.42 3.58 HanSurLesse 130 167 3 40 15.6 1.28 43.49 60.46 12.4 Krubera 235 242 1 8 1.72 1.03 34.62 7.46 1.12 Lechuguilla 4187 4965 1 779 9.31 1.19 39.55 37.63 6.26 Llangattwg 67 63 5 1 0.78 0.94 32.31 −4.30 0.50 Mammuthöhle 2168 2532 25 389 8.98 1.17 38.97 34.87 6.44 Monachou 80 87 1 8 5.16 1.09 37.18 20.91 3.63 OjoDelAgua 258 292 1 35 6.85 1.13 38.02 27.76 5.10 OxBelHa 838 838 1 1 0.06 1.00 33.41 0.89 0.04 PicDuJer 26 26 1 18 2.13 1.00 36.11 7.58 2.56 Ratasse 59 60 1 2 1.77 1.02 35.09 7.58 1.13 SaintMarcel 713 763 3 53 3.73 1.07 35.77 15.08 2.83 Sakany 344 412 1 69 10.1 1.20 40.16 40.68 6.46 Shuanghe 1058 1111 3 56 2.65 1.05 35.07 10.93 1.91 SiebenHengsteLargePart 2063 2293 1 231 5.61 1.11 37.09 22.92 4.34 SiebenHengsteUpPart 65 65 5 5 4.00 1.00 34.39 8.06 3.24 SiebenHengsteSP1 250 284 1 35 7.07 1.14 38.17 28.64 5.24 SiebenHengsteSP2 156 178 1 23 7.49 1.14 38.53 30.23 6.03 Wakulla 55 58 1 4 3.81 1.05 36.48 15.54 3.56

network is connected on a local neighbour-to-neighbour scale (e.g., Newman, 2003; Andresen et al., 2013).

The results obtained on the 34 karstic systems range from 0 (Clydach Gorge Caf 1 and Genie Braque) to 0.12 (Han-sur-Lesse) with a mean value of 0.04 (Table 4). An evident correlation appears with the connectivity degree and is further discussed inSection 5.

4.2. Measures on vertex degrees

4.2.1. Distribution and coeﬃcient of variation of vertex degrees

Simple statistics related with the vertex degrees can also be com-puted. These include the average vertex degree ¯k and the standard deviation of vertex degreessk.

Fig. 12. Three top views of karstic networks that have increasing values of Howard’s parameters and connectivity degree: low values characterize branchwork patterns, while high values are observed for reticular systems.

(13)

Fig. 13. Cyclic coefficient. (A) An example for the local cyclic coefficient computation: the vertex i has five neighbours, ki= 5. Only one cycle passes through 1 and 2 or 4 and 5, Li

12= Li45= 3. Two cycles pass through 3 and 5: by c1and by c2(four through 3 and 4). The shortest one is c1, so Li34 = 4 and Li35= 5. Vertices 1 and 2 are not directly linked to vertices 3, 4 and 5, Li

13= Li14= Li15= L23i = Li24= Li25=∞. As a result,Hi= 0.11. (B and C) Two examples of global cyclic coeﬃcient values on a small network with N = 6 vertices showing extrema cases: in case (B), no cycle is present in the network,h = 0; in case (C), each vertex is always directly linked to the ﬁve other vertices, and all cycles have the smallest possible dimension of 3, soh = 0.33.

With a mean value of 2.14, the average vertex degree ranges from 1.75 to 2.57 on the 34 studied karsts (Table 5 and Fig. 14). To inter-pret these values, it is important to remember that this metric is calculated on the reduced graphs of the networks so that all nodes of degree 2 have been removed. Starting from this observation, if the studied graph is acyclic, i.e., no cycle is observable, and we consider a network with kmax= 3, a relation exists between the total number of nodes N and n3the number of nodes of degree 3:

N = 2n3+ 2 (11)

It induces a direct relation between the average vertex degree ¯k and N:

¯k = 2(N − 1)

N (12)

which involves that ¯k → 2 when N → ∞. Moreover, if we autho-rize vertex degrees

>

3, it does not change this observation as each edge addition involves also the addition of a new vertex of degree 1 (Fig. 15C). Thus, a value ¯k≥ 2 indicates a network that shows cycles. Also, as soon as one cycle is introduced in the network, ¯k is only increasing if new cycles or vertices with degree

>

3 are introduced (Fig. 15).

Like for lengths, the coeﬃcient of variation of vertex degrees

CVkcan be computed to characterize the dispersion of degrees. It is expressed in percentage and corresponds to the ratio of the standard deviationskto the mean ¯k:

CVk= s_¯kk∗ 100 (13)

The computed values of CVkare relatively high and range from 35% to 60%. Indeed, vertex degrees globally range between 1 and 3, sometimes (but rarely) up to 5. As we work with reduced graphs, all vertices of degree ki= 2 have been removed. Computing an entropy of vertex degrees in this context is not really relevant.

4.2.2. Correlation of vertex degrees: assortativity

The correlation of degrees between ﬁrst neighbour vertices has been found to play an important role in structural and dynamical

Table 5

Topology measures on the 34 reduced graphs of the karst networks: other parameters (¯k is the average vertex degree; CVkis the coeﬃcient of variation of degrees; rkis the correlation of vertex degrees; ¯SPL is the average shortest path length; CPD is the central point dominance); maximal and minimal values are enhanced in bold.

¯k CVk(%) rk SPL¯ CPD AgenAllwed 2.29 50.05 0.63 11.72 0.54 ArphidiaRobinet 2.18 49.06 0.74 13.28 0.45 Arrestelia 2.03 57.14 0.77 29.16 0.50 BlueSpring (2D) 2.11 47.13 0.73 26.90 0.46 Ceberi 2.08 51.42 0.63 7.20 0.30 CharentaisHeche 2.22 45.91 0.69 25.40 0.43 ClydachGorge Caf1 1.75 59.15 0.44 2.32 0.56 ClydachGorge Ogof Capel 2.11 52.26 0.49 3.95 0.47

Crevice (2D) 2.07 48.35 0.70 19.66 0.55 Crossroads (2D) 2.32 41.75 0.83 17.33 0.34 DarenCilau 2.13 55.47 0.74 18.44 0.51 EglwysFaen 2.02 59.49 0.59 8.69 0.57 FoussoubieEvent 2.16 48.16 0.72 7.87 0.47 FoussoubieGoule 2.19 45.96 0.76 25.07 0.44 GenieBraque 1.87 60.29 0.55 3.37 0.51 GrottesDuRoy 2.15 48.91 0.75 11.36 0.31 HanSurLesse 2.57 35.26 0.85 7.14 0.20 Krubera 2.06 58.80 0.58 21.48 0.45 Lechuguilla 2.37 52.01 0.84 55.76 0.49 Llangattwg 1.88 59.65 0.52 4.05 0.16 Mammuthöhle 2.34 53.59 0.72 14.89 0.02 Monachou 2.18 50.03 0.75 7.58 0.40 OjoDelAgua 2.26 47.42 0.78 19.58 0.38 OxBelHa 2.00 51.79 0.65 49.42 0.54 PicDuJer 2.00 49.00 0.59 5.80 0.37 Ratasse 2.03 55.51 0.70 8.40 0.44 SaintMarcel 2.14 51.53 0.72 21.69 0.23 Sakany 2.40 47.08 0.88 12.99 0.27 Shuanghe 2.10 51.02 0.73 25.98 0.27 SiebenHengsteLargePart 2.22 48.12 0.77 47.37 0.46 SiebenHengsteUpPart 2.00 53.03 0.66 4.28 0.14 SiebenHengsteSP1 2.27 45.96 0.75 15.53 0.49 SiebenHengsteSP2 2.28 46.91 0.79 14.04 0.39 Wakulla 2.11 47.13 0.64 8.11 0.43

network properties (Maslov and Sneppen, 2002). In order to assess the correlation of degree of neighbour vertices in our reduced net-works, we compute the Pearson correlation coeﬃcient of the degrees at both ends of an edge (Newman, 2002):

rk= 1 S j>ikikjwij− 1 s j>i12(ki+ kj)wij 2 1 s j>i12 k2 i + k2j wij− 1 S j>i12(ki+ kj)wij 2 (14)

where S is the total number of edges, and wijrefers to the corresponding values in the adjacency matrix representation. If r

<

0 the network is disassortative, i.e., vertices of high degree tend to connect to vertices of low degrees (Fig. 16A). If r = 0 there is no correlation between vertex degrees. If r

>

0 the network is assortative, i.e., vertices of high degrees tend to connect with vertices of high degrees (Fig. 16B).

The values obtained on the 34 reduced networks are all positive, ranging from 0.44 to 0.88. The reduced representations of karstic sys-tem are thus assortative (Table 5). This is probably linked to the fact that maximal node degrees rarely exceed 4. Thus, nodes are globally regrouped with nodes of similar low degrees.

4.3. Average shortest path length

In the reduced networks, we deﬁne a path as a sequence of edges linking two seed vertices i and j. Because all the edges have a length

(14)

Fig. 14. Examples of karstic networks and corresponding average vertex degree: on the left, the Clydach Gorge Caf1, which is a tree network: ¯k = 1.75; on the right Han-sur-Lesse, which has a large number of cycles and nodes of degree 4: ¯k = 2.57.

A B

C

Fig. 15. Illustration of the average vertex degree meaning for reduced graphs. (A and C) For tree graphs, ¯k is increasing toward 2 as the size of the tree graph increases, but the value 2 is never reached. (B) As soon as one cycle is introduced in the network, ¯k = 2. Then, ¯k is only increasing with the addition of new cycles or of nodes of degree>3.

equal to 1 in the reduced graph, the length of a particular path is given by the number of edges that constitute the path. The shortest path between vertices i and j is therefore the path that links the two vertices with the minimal number of edges. If no path exists between two vertices, i.e., the network is composed by disconnected

compo-nents, the length is set to∞. Therefore, in order to avoid divergence

of the coeﬃcient, we compute the shortest path for each connected components of the network separately and then compute the mean

of all components. For any vertex i, the shortest path length SPLiis

the average shortest path that separates i from any other vertex j of the same connected component:

SPLi= 1 (N− 1) j Lij (15)

where N is the number of vertices in the network. We deﬁne the average shortest path length of a network as the mean of the shortest path length ratios:

¯ SPL = 1 N i SPLi (16)

The average shortest path length ¯SPL ranges from 2.32 to 55.76

for the 34 studied networks with a mean value of 16.93. The ¯SPL is

A

B

Fig. 16. Correlation of vertex degrees is a measure of assortativity. (A) An example of disassortative network (rk<0): vertices of high degree tend to connect to vertices of low degrees; (B) an example of assortative network (rk>0): vertices of high degree tend to connect to vertices of high degrees (‘hubs connected to hubs’).

(15)

Fig. 17. Average shortest path length, ¯SPL: (A) and (C) are complete graphs, each node is directly linked to all other ones by one edge, thus ¯SPLA= ¯SPLC= 1 independently of the total number of nodes N. On the opposite, (C) and (G) are linear graphs, but with different sizes: (C) is longer than (G), thus ¯SPLB<SPL¯ D. As we compute ¯SPL on a reduced graph, these linear structures cannot be observed (all nodes of degree 2 have been removed). Cases (B) and (E) demonstrate that, for the same spatial architecture,

¯

SPL increases with the total number of nodes N. Cases (E) and (F) demonstrate that, for the same total number of nodes N, ¯SPL also varies depending on the architecture of the tree graph: case (F) is more linear than case (E).

related to the eﬃciency of transfer processes through the network. For example, applied on the world wide web, a short ¯SPL accelerates

the transfer of information. Here, it is calculated on the reduced rep-resentations of the karst networks, which ignores real distances. It just characterizes the eﬃciency of the network in terms of its struc-ture or topology. The average shortest path length is thus jointly inﬂuenced by two parameters (i) the linearity of the network, and (ii) the size of the network (Figs. 17 and 18).

4.4. Betweenness centrality and central point dominance (CPD)

The betweenness centrality of a vertex i is a measure of the vertex importance in a network. We consider it as a topological measure

Fig. 18. Examples of karstic networks and their associated ¯SPL value. (A and A) The Clydach Gorge Caf1 network is a short one with ¯SPL = 2.32; (A) presents the real network with only the seed vertices visible. (A) presents the nonscale corresponding view of its reduced representation. (B) The Lechuguilla cave has ¯SPL = 55.76.

Fig. 19. Central point dominance (CPD): (A) example of a complete graph, i.e., a graph in which each vertex has a direct edge to all the other vertices: CPD = 0; (B) for a star graph CPD = 1, i.e., a central vertex is included in all paths.

and therefore compute it on the reduced network. The betweenness centrality relates to the number of shortest paths that cross a vertex

i with the following relation:

Bi= jk nj,k(i) nj,k (17)

wheren(j, i, k) is the number of shortest paths between the vertices j and k that pass through the vertex i, andn(j, k) is the total number

Fig. 20. Examples of central point dominance (CPD) results: (A) Mammuthöhle cave is constituted by 25 connected components and a dispersed organisation CPD = 0.02; (B) Eglwys Faen is characterized by the existence of several high degree nodes (kmax= 5) and a more centralized pattern CPD = 0.57.

A B C D E F G

A

B

A

B

A

B

(16)

of shortest paths between j and k. The sum takes over all distinct pairs j, k of vertices. The Biindexes the potential of a point for control. As it is essentially a count, its magnitude depends, among others, upon the number of points in the graph. To eliminate this impact, Freeman (1977)proposed to use a relative centrality, B∗_i. For any undirected star graph, this normalization has to guarantee that the relative betweenness centrality of the central point is equal to 1. Thus

B∗_i = Bi

N2_{− 3N + 2} (18)

where N is the number of nodes in the network. The central point dominance for a network is then deﬁned byFreeman (1977)as

CPD = 1 N− 1 i (B∗_max− B∗ i) (19)

where B∗_maxis the largest value of relative betweenness centrality in the network.

The central point dominance is 0 for a complete graph, i.e., a graph in which each vertex has a direct edge to all the other vertices, and 1 for a star graph in which a central vertex is included in all paths (Fig. 19A and B). On karstic systems, we obtained values ranging from 0.02 to 0.57 with a mean value of 0.39. The highest value is obtained on the Eglwys Faen cave, which is characterized by several vertices

of degree k ≥ 4 and quite centralized. On the opposite, the Mam-muthöhle cave, which is composed by 25 connected components, all of them without noticeable centralized organization, get the lowest value of our data set with CPD = 0

.

02 (Fig. 20).

5. Comparison of the metrics and discussion

We proposed different metrics for the characterization of karst geometry and topology. All of them have been computed on 34 karstic networks coming from various parts of the world and related to different speleogenetic processes. In this section, we study the relation between all these metrics in order to identify redundant ones and to better understand their signiﬁcation. To start the anal-ysis, Table 6 presents the linear correlation coeﬃcients that we computed between the different metrics.

5.1. Geometrical parameters

We chose to focus the geometrical analysis on orientation, length, and tortuosity of conduits.

The orientation analysis helps in identifying preferential direction of speleogenesis that can be linked to the existence of inception fea-tures, like fracfea-tures, bedding, and faults. Orientation also underlines the curvilinear patterns of karsts developed along almost horizon-tal bedding planes. Our results show the high variety of orientation patterns and the diﬃculty to deﬁne general reference cases. The values correspond to 0.7≤ |r|<0.85; values correspond to|r| ≥ 0.85, which Table 6

Linear (Pearson) correlation coeﬃcients r between all studied metrics; expresses a strong linear correlation between both variables.

(17)

Fig. 21. Common patterns of solutional caves according toPalmer (1991)and links with some topological metrics we propose: interconnectivity is increasing with the average node degree ¯k; orientation entropy HOis lower when clear preferential directions of karst development exist; the central point dominance CPD should be higher for ramiform patterns, and spongework patterns could be characterized by low values of average shortest path length ¯SPL. Nonetheless, especially for the two last parameters, relations are still required to be demonstrated with a dedicated statistical analysis supported by speleogenetical considerations.

use of orientation entropy, as recently proposed for urban street analysis (Gudmundsson and Mohajeri, 2013), provides an interest-ing tool to quantitatively classify network on orientation criteria. However, density map analysis is necessary to complement this metric and especially to detect the existence of a vertical preferential development of conduits.

The average length, length entropy, and the coeﬃcient of vari-ation of conduit length are interesting parameters to describe the geometry of karstic networks. The length entropy informs about the distribution of conduit dimensions, while the coeﬃcient of variation shows the extent of variability relative to the mean. A negative non-linear correlation is observed between the length entropy and the size of the network (expressed through the numbers of survey sta-tions n, of lines of sight s, of seed vertices N, and of karst branches

S). The computation of entropy divides indeed the samples in bins.

Thus, for smallest karsts, the probability to obtain a bin that contains one or two samples is higher. So, even if the population is tight-ened around the mean (low value of the coeﬃcient of variation), the resulting distribution is more likely uniform.

Tortuosity, also called sinuosity index, has often been used to describe karst geometry (Pardo-Iguzquiza et al., 2011; Jeannin et al., 2007). Nonetheless, the computed values reveal only slight varia-tions that do not seem to correspond to a particular or clear curvilin-ear karst pattern classification when visualizing the corresponding systems. We suspect that tortuosity is probably strongly affected by the survey methodology. If this is true, it should be used as a crite-rion or as an input parameter for conduit geometry simulation only when data were acquired following a very well-defined sampling procedure, i.e., with stations located in the center of the conduits and a fixed distance between the stations.

Other geometrical metrics have been investigated in previous papers (Howard, 1971; Pardo-Iguzquiza et al., 2011). In particular, a set of metrics use widths, passage areas, mean height diameters. These metrics provide important volumetric information, but they required ﬁeld data that were not available for all the karst systems considered in this study and this is why they are not included here. Such a volumetric description is, however, clearly vital for providing a complete description of a cave system.

5.2. Topological parameters

Concerning topology, we used several metrics to described the network organization. Considering their correlation coeﬃcients, some of them appear to express similar characteristics of the networks. It is important to remember that topological metrics are computed on reduced graphs: all vertices of degree k = 2 are removed from the networks.

The interconnectivity of the network is jointly expressed by all Howard’s parametersa, b, and c, the resulting degree of

connectiv-ity DC, the global cyclic coeﬃcienth, and ﬁnally, the average vertex degree ¯k. All these parameters are indeed highly linearly correlated (Table 6). Additionally, a perfect linear correlation (r = 1) is observed betweenb and ¯k, which are linked by the relation ¯k = 2b. As a

con-sequence, we would recommend to keep only one of these metrics for characterizing the interconnectivity of the network. Our choice goes to the average vertex degree ¯k, which is quite easy to compute. As demonstrated inSection 4.2.1, ¯k has the advantage to classify tree graphs (with values 1

.

5≤ ¯k

<

2) and interconnected systems (val-ues ¯k≥ 2). Note that these peculiarities of ¯k are completely linked to the fact that we compute it on reduced graphs. In other studies, e.g., on fracture networks (Andresen et al., 2013), the presence of nodes of degree 2 leads to values ¯k≤ 1

.

5.

The correlation of vertex degrees have demonstrated that the reduced graphs of karstic systems were all assortative (0

.

44≤ rk≤ 0

.

88). Previously, Newman (2002) has showed that many social networks were assortative while technological and biological net-works seem to be disassortative. It is perhaps linked to the fact that high degree vertices are quite rare in karstic networks: we did not record k values

>

5. As a comparison, the equivalent graphs of trans-formed fracture networks proposed byAndresen et al. (2013)were characterized by a mean value kmax = 29. Assortativity is slightly linked to ¯k (r = 0

.

79,Table 6), but the low correlation coeﬃcients obtained with other metrics relating to interconnectivity (a, c, DC,h) show that it does not express the interconnectivity of the network. It is a complementary metric.

Average shortest path length ( ¯SPL) is a standard metric in graph