Comparing connected structures in ensemble of random fields

(1)

Comparing

connected

structures

in

ensemble

of

random

ﬁelds

Guillaume

Rongier

a, b, ∗

_,

_Pauline

_Collon

a

_,

_Philippe

_Renard

b

_,

_Julien

_Straubhaar

b

_,

Judith

Sausse

a

a GeoRessources (UMR 7359, Université de Lorraine / CNRS / CREGU), Vandoeuvre-lès-Nancy, F-54518, France b Centre d’Hydrogéologie et de Géothermie, Université de Neuchâtel, 11 rue Emile-Argand, 20 0 0 Neuchâtel, Switzerland

Keywords: Stochastic simulations Comparison Static connectivity Indicators Dissimilarity

a

b

s

t

r

a

c

t

Very different connectivity patterns may arise from using different simulation methods or sets of parameters, and therefore different flow properties. This paper proposes a systematic method to compare ensemble of categorical simulations from a static connectivity point of view. The differences of static connectivity cannot always be distinguished using two point statistics. In addition, multiple-point histograms only provide a statistical comparison of patterns regardless of the connectivity. Thus, we propose to characterize the static connectivity from a set of 12 indicators based on the connected components of the realizations. Some indicators describe the spatial repartition of the connected components, others their global shape or their topology through the component skeletons. We also gather all the indicators into dissimilarity values to easily compare hundreds of realizations. Heat maps and multidimensional scaling then facilitate the dissimilarity analysis. The application to a synthetic case highlights the impact of the grid size on the connectivity and the indicators. Such impact disappears when comparing samples of the realizations with the same sizes. The method is then able to rank realizations from a referring model based on their static connectivity. This application also gives rise to more practical advices. The multidimensional scaling appears as a powerful visualization tool, but it also induces dissimilarity mis- representations: it should always be interpreted cautiously with a look at the point position confidence. The heat map displays the real dissimilarities and is more appropriate for a detailed analysis. The comparison with a multiple-point histogram method shows the benefit of the connected components: the large-scale connectivity seems better characterized by our indicators, especially the skeleton indicators.

1. Introduction

Connectivity is a key aspect of a geological study for its influence on fluid circulations. From a reservoir engineering perspective, it relates to geological structures with high and low per- meabilities. But it also relates to the spatial distribution of these structures and the resulting inter-connections, which define the static connectivity. An incorrect connection can bias the results of the flow simulations ( Gómez-Hernández andWen,1998; Jour-nelandAlabert,1990). Reproducing the geological bodies together with their relations is so of prime importance (e.g., Deutsch and Hewett,1996;KingandMark,1999).

Stochastic simulations aim at generating possible representations of the geological bodies with respect to the available data. Several methods exist, with a usual separation in two main categories:

∗ _{Corresponding author.}

E-mail address: guillaume.rongier@univ-lorraine.fr (G. Rongier).

• Pixel-based methods simulate one cell at a time, based on a prior model describing the structures of interest. In sequential indicator simulation (SIS) ( DeutschandJournel, 1992), the prior is a variogram built upon the two-point statistics of the data. Hard data conditioning with such method is easy. But the simulated structures do not look like geological bodies. This is especially true for bodies with curvilinear geometries such as channels, whose continuity is badly preserved. The plurigaus- sian simulation (PGS) ( Galli et al., 1994) limits this diﬃculty by accounting for the facies relationships. Multiple-point simulations (MPS) go a step further by borrowing multiple-point statistics not from the data but from an external representation of the expected geology, the training image (TI) ( Guardianoand Srivastava,1993).

• Object-based methods rely on the deﬁnition of geometric forms and their associated parameters. Each form represents a particular geological body (e.g., Viseur, 2001; Deutsch and Tran, 2002). The objects are then randomly placed in the domain of interest with parameters drawn in statistical laws. More re- cent approaches introduce some genetic aspects to improve the Published in Advances in Water Resources 96, 145-169, 2016

(2)

object organization (e.g., Lopez,2003;Pyrczetal.,2009). They provide more geologically consistent results. For instance channel continuity and relationships are better preserved than with pixel-based methods. But this is at the cost of the ease of parametrization. And object-based approaches have diﬃculty to condition the objects to data.

All these methods have advantages and drawbacks. This will inﬂuence the choice of a method and its parameter values when dealing with a case study.

But few work aims at systematically analyzing the quality of a set of realizations regarding their static connectivity. The quality control often consists in comparing the histogram and variogram of several realizations with those of the data, or of the training image if any (e.g., Strebelle,2002;Mariethoz etal., 2010;Tahmasebi et al., 2012). If more than the ﬁrst two-order statistics are nec- essary to simulate geological bodies (e.g., Guardiano and Srivas-tava, 1993;Journel, 2004), the same conclusion must apply when comparing realizations. Some authors propose to also use the higher-order statistics for quality analysis. Boisvertetal.(2010)and Tan etal.(2014)propose to analyze the multiple-point histogram. De Iaco and Maggio (2011) and De Iaco (2013) also explore the multiple-point statistics with high-order cumulants.

The purpose of most simulation methods is to reproduce statistics from a prior. Analyzing statistics highlights the method success in this reproduction, not if the realizations are geologically consistent. To do that, the statistical analysis is often completed by a visual evaluation of the global structures. The geological structures are compared to what is expected from the known geology, with a focus on the further use of the realizations. This use is often related to ﬂuid circulations, and requires an assessment of the static connectivity, which is not directly imposed by the simulation methods contrary to the statistics. But a visual analysis remains subjective and limited to a few realizations, often in two- dimensions (e.g., Yinetal.,2009;Tahmasebietal.,2012).

Yet, some studies focus on analyzing the connectivity of the realization bodies. For instance Meerschman etal.(2012) use the connectivity function with the histogram and variogram to analyze the simulation parameter impact for the Direct Sampling MPS method ( Mariethoz etal., 2010). Deutsch(1998)uses directly the connected components determined from lithofacies, porosity and permeability models. He computes indicators such as the number of connected components or their sizes to rank the realizations. DeIacoandMaggio(2011)and DeIaco(2013)also use some measures related to the connected components, such as their number or their mean surface and volume. Comunianetal.(2012)rely on some of the previous indicators to analyze the quality of three- dimensional structures simulated from two-dimensional training- images. They also consider the equivalent hydraulic conductivity tensor as an indicator. However, this requires to have an idea of the hydraulic conductivities for the simulated facies.

Connected components enable to characterize the geometry and topology of the geological bodies, which is the purpose of the visual comparison of realizations. They also enable to study the static connectivity of the geological bodies, while being easy to compute. Contrary to a visual analysis of the realization, indicators from connected components are unbiased and can compare many realizations. Contrary to statistical or hydraulic property indicators, they focus on the sedimentary bodies by characterizing their connectivity and are more easy to apprehend. However, current methods based on the connected components are limited to few simple indicators, often analyzed independently.

This leads to the question of the result visualization to analyze more effectively the indicators. ScheidtandCaers(2009)and Tanetal.(2014)both rely on the computation of dissimilarity values between the realizations. Those dissimilarities are computed

based on the quality indicators measured on each realization. They are then visualized based on a MultiDimensional Scaling (MDS) (e.g., Torgerson,1952;Shepard,1962a;1962b). MDS represents the realizations as points, with the distance between the points as close as possible to the dissimilarities. The global analysis of the realization dissimilarities is so easier.

The present work aims at analyzing and discussing a set of indicators to quantify the quality of stochastic simulations from the viewpoint of static connectivity. This method performs on categorical three-dimensional images representing the facies constitut- ing the geological bodies of interest. It can be applied on realizations from one or several stochastic simulation methods and/or parameter values. Conceptual images representing ideally the structures to simulate can also be considered. The chosen set of indicators relies on quantitative measurements on connected components and their skeletons ( Section 2). The indicators are used in dissimilarity computations to analyze the quality more directly ( Section3). Several realizations obtained with different simulation methods ( Section 4.1) are then used to test the method and compare it to the multiple-point histograms ( Section 4), and discuss the results ( Section5).

2. Indicatorstomeasuresimulationquality

The quality analysis ﬁts in a stochastic process implying the simulation of many realizations in a grid. It further investigates the differences of static connectivity between these realizations.

2.1. Aboutgridsandgridcells

Many methods to simulate geological structures rely on a dis- cretized representation of the domain of interest: a grid. The grid is a volumetric mesh composed of simple elements, hereinafter called cells.

Many types of grid exist, with different cell types (e.g., tetra- hedron or hexahedron). Most of the stochastic simulation methods rely on hexahedral grids, either regular or irregular. Irregular hexahedral grids help to be as conform as possible to the geological structures such as horizons and faults. The sedimentary bodies are then simulated within the parametric space of the grid (e.g., Shtuka et al., 1996). The parametric space mimics a deposition space to get rid of the deformation and faulting occurring after deposition and materialized in the grid geometry.

Consequently, the indicators are computed on hexahedral grids, both regular and irregular. Similarly to the simulation, the indicator computation is done in the parametric space of the grid. Thus, the indicators based on volumes or surfaces are rather computed using number of cells and number of faces. This avoids biases related to different grid geometries, which give different indicator values even if the objects are the same when transferred in the same grid. Within a grid, the cells are connected one to another by their faces, their edges and/or their corners ( RenardandAllard, 2013). In the case of the hexahedral grids used for this work, one cell has three possible neighborhoods ( Fig.1):

1. One neighborhood composed of six face-connected cells. 2. One neighborhood composed of eighteen face- and edge-

connected cells.

3. One neighborhood composed of twenty-six face-, edge- and corner-connected cells.

This deﬁnition of the connectivity between a cell and its neighborhood can be extended to form connected components.

2.2. Basicelement:theconnectedcomponent

The connected components result from the widening of the neighborhoods. They rely on the following deﬁnition of the con-

(3)

Fig. 1. Possible neighborhoods for a given central cell in a regular grid (modiﬁed from Deutsch (1998) ).

Fig. 2. Connected components of a given facies in a two-dimensional structured grid. The cells a and b are connected and belong to the same connected body. There is no possible connected path between those cells and c , which belongs to another connected body. The cell d constitutes a third connected body in the case of a face-connected neighborhood. In the case of an edge- or corner-connected neighborhood, d belongs to the connected body 1.

nectivity between two cells: two cells belonging to the same facies are connected if a path of neighboring cells remaining within the same facies exists ( Fig.2). Applying this deﬁnition to all the cells of a facies gives the connected components of this facies.

This leads to a distinction between the geological objects, such as a channel or a crevasse splay, and the connected components. Indeed, the geological objects often tend to cross each others, giving one connected body where there is in fact several geological objects ( Fig.2). The range of possible shapes is larger for the connected components than for the individual object. This aspect com- plicates the comparison between images. But determining the connected components is far easier than trying to retrieve the geological objects. This is also close to the functioning of pixel-based methods, which do not try to reproduce geological objects but groups of cells, and therefore connected components.

2.3. Basicelement:theskeleton

A curve-skeleton – simply called here skeleton – is a thin one-dimensional representation of a three-dimensional shape. It is composed of nodes linked together by one or more segments

Fig. 3. Example of skeletons for the connected components of the Fig. 2 . Here the nodes connected to only one segment – the nodes of degree 1 – are all along a grid border. Two nodes of degree three highlight the local disconnections between the channels at the bottom. The connected component at the top has no node of degree higher than two, which shows the complete connectivity of all its cells, even locally.

( Fig.3). The degree of a node is the number of segments connected to that node. Skeletons are often used to study some geometrical and topological features of a shape. Here the skeletons are those of the connected components. They enable to further characterize the global shape of the connected components, while giving more details about their topology than indicators directly computed on the components.

Several methods exist to compute skeletons (e.g., Serra, 1983; Jain,1989; Brandt andAlgazi, 1992). The method considered for this work is based on slicing the grid along a given axis. The grid is subdivided into parallel slices of a given thickness. On each slice the connected components are computed and one node is assigned to each component. The nodes are then linked by computing the connected components over two adjacent slices. If two components from two slices form one connected component when the slices are combined, their nodes are linked. If they form several components, their nodes are not linked.

2.4.Indicators

The indicators studied in this paper focus on analyzing the connectivity of the geological bodies within a three-dimensional image. This static connectivity analysis is possible thanks to the connected components. All the indicators are quite simple and each one gives only partial information about the connectivity and its structure. But their combination provides a more detailed characterization.

AppendixAdeﬁnes in detail all the indicators. Table1summa- rizes the indicator deﬁnition, by focusing on their relationship with the connected components.

We distinguish three categories of indicators:

Global indicators: The global indicators characterize a facies

and not necessarily an individual connected component. Among them, the facies proportion is a classical indicator to compare realizations. Some others, such as the facies connection probability ( RenardandAllard,2013), the connected component density or the traversing component proportion give an idea of the global connectivity.

Shape indicators: Global measures such as facies proportions

are not suﬃcient to characterize precisely the impact of the 3

(4)

Table 1

The set of indicators with their deﬁnitions. The indicator deﬁnitions is given by a numerator and a denominator because the majority of the indicators comes from a ratio. Some ratios are computed on a single connected component, and their values are combined to obtain one indicator value per facies. We use the term “component” instead of “connected component” for the sake of simplicity.

Category Indicator Symbol Numerator Denominator Value for a facies

Global indicators Facies proportion p Number of component cells Number of cells of the grid Sum of all the component values

Facies adjacency proportions

pa _{Number of component cells}

adjacent to a cell of a given other facies

Number of cells of the facies adjacent to a cell of any other facies

Sum of all the component values

Facies connection probability

Squared number of

component cells Squared number of cells ofthe facies Sum of all the componentvalues Connected component

density

Number of components Number of cells of the grid – Unit component proportion pu _{Number of components of}

one cell

Number of components – Traversing component

proportion

pc _{Number of components}

linking two opposite borders of the grid

Number of components –

Shape indicators Number of component cells n Number of component cells – Average of the

non-unit-component values

Box ratio β Number of cells of a

component

Number of cells of the axis-aligned bounding box of the component

Average of the non-unit-component values

Faces/cells ratio ζ Number of faces composing

a component surface Number of cells of a component Average of the non-unit-component values

Sphericity φ Surface area of a sphere Surface area of the

connected component

Average of the non-unit-component values

Skeleton indicators Inverse branch tortuosity t Distance between the extremities of a branch

Branch curvilinear length Average of all the branch values

Node degree proportions pn _{Number of node connected}

to n segments

Total number of node for all the skeletons

Sum of all the skeleton values

related facies on the flow (e.g., Westernetal., 2001; Mari-ethoz,2009). In particular, OrianiandRenard(2014)showed the influence of the connected component geometry – i.e., their shape – on the equivalent hydraulic conductivity, and therefore on the flow behavior. The shape indicators characterize the connected component shape through simple surface and volumetric measures. They all give one value per component. The arithmetic mean of those values provides a value of the indicator for a given facies. This makes the indicator comparison easier.

Skeleton indicators: The skeletons help to better characterize

the topology and global geometry of their connected components: their one-dimensional representation is easier to analyze. Here two indicators are introduced. The inverse branch tortuosity characterizes the geometry of the skeleton. Its values for all the branches of all the skeletons related to a facies are averaged to obtain a single value for the facies. It completes the shape indicators in the characterization of the connected component shape. The node degree proportion depicts the topology of the skeletons. It helps to analyze the connectivity more precisely.

3. Qualityanalysisconsiderations

The ﬁnal purpose of this work is to easily and objectively compare several realizations. The indicators are thus computed on large sets of realizations, which may come from different methods and/or parameters. Then dissimilarity values based on the indicators help to compare the realizations.

3.1. Inﬂuenceofdifferentgriddimensions

Some cases imply to compare realizations on different grids, and the grids may have different dimensions. For instance in MPS,

the training image is often larger than the simulation grid to max- imize pattern repeatability.

The grid dimensions inﬂuence the size of the traversing connected components, such as channels. This impacts in particular the connected component density and the number of component cells. When the grid size varies along the channel direction, the number of cells for the channels also varies. And even though the number of channels does not necessarily change, the grid volume does, impacting the density. These indicators highlight expected differences in such cases. Their direct use is then detrimental to the quality analysis.

We propose two workarounds to compensate for different grid sizes:

• Either sampling the images from the different grids so that all the samples have the same dimensions. The sample size are the largest dimensions common to all the grids. Each sample is randomly extracted and each image may be sampled several times to still catch the characteristics of the whole image.

• Or correcting the indicators of the difference between the grid dimensions. The smallest grid dimensions among all the grids form a hypothetical reference grid. The indicators are corrected to their expected value in such reference grid. AppendixBde- tails this correction.

The sampling exempts from correcting the indicators, but it adds a step and requires the analysis of more images, which could slow down the process. If they are valid, the corrections should give similar results than the sampling in a more eﬃcient process.

3.2. Indicatorrescaling

The rescaling ensures that the differences between the ranges of indicator values will not affect the comparison. The histogram- based indicators – facies proportion, facies adjacency proportion

(5)

and node-degree proportion – are not rescaled, to preserve their histogram behavior for the dissimilarity computation ( Section3.3). Two methods can be used for rescaling: normalization and standardization.

The normalization method consists in rescaling linearly the indicators values between 0 and 1. The indicator Iiis the ith indicator of the set previously deﬁned. When computed for the facies f

of the realization r, we will denote the computed indicator I_{i f}r. The normalization is then obtained by rescaling it between its minimum and maximum values:

norm

(

Ir i f

)

= Ir i f− m i f Mi f − m i f (1) with M_if the maximum value for the same indicator and facies among all the images and mif the minimal value for the same indicator and facies among all the images.

The standardization method consists in using reduced-centered indicator values. For an indicator i the standardized value for a facies f of a realization r is obtained using the following formula: stand

(

Ir i f

)

= Ir i f−

μ

i f

σ

i f (2) with

μif

the mean for the same indicator and facies among all the images and

σif

the standard deviation for the same indicator and facies among all the images. Standardization is an interesting op- tion to focus on the indicator variance. The normalization on the other hand decreases the inﬂuence of outliers and gives precise limits to the indicator values.

3.3. Dissimilaritycalculation

The principle of comparing two images is to determine how dissimilar these images are. The indicators can be seen as coordinates of the compared images. These indicators are heterogeneous: they are either based on histograms or on continuous values. The computation of a dissimilarity value between two images requires a heterogeneous metric.

Following the example of Wilson and Martinez (1997), two different metrics are combined into a heterogeneous Euclidean/Jensen–Shannon metric. It uses the Jensen–Shannon distance, square root of the Jensen–Shannon divergence ( Lin, 1991; Rao,1987), for the histogram-based indicators – facies proportion, facies adjacency proportion and node-degree proportion – and the Euclidean distance for all the other indicators. The distance between two images r and s for a given indicator i of a given facies f

is given by: d

(

Ir

i f,Ii fs

)

=

dJS

(

Ir_{i f},I_{i f}s

)

if I_{i f}r and I_{i f}s are histograms

dE

(

Ir_{i f},I_{i f}s

)

if I_{i f}r and I_{i f}s are continuous values (3) with I the indicator values. dJS represents the Jensen–Shannon distance: dJS

(

Hir,His

)

=

1 ₂ n j=1

⎡

⎣

Hr i jlog

⎛

⎝

Hri j 1 2

(

H r i j+ Hi js

)

⎞

⎠

+ Hs i jlog

⎛

⎝

Hi js 1 2

(

H r i j+ Hi js

)

⎞

⎠

⎤

⎦

(4) with Hr

i and His the histograms of the indicator i for respectively the images r and s, n the number of classes for each histogram, Hr

i j and Hs

i j the proportions for the class j in the corresponding histograms. dE represents the Euclidean distance used with rescaled indicators: dE

(

Iri f,I s i f

)

=

(

resc

(

Ir i f

)

− resc

(

Ii fs

))

2 (5) with Ir

i f and Isi f the values of the indicator i for the facies f of respectively the images r and s and resc either norm (formula 1) or

stand (formula 2). The ﬁnal dissimilarity

δ

between two images r

and s given their respective sets of indicators Ir_and_Is_is:

δ

(

Ir,Is,

ω

,

ν

)

=

_ω

1dJS

(

I1r,I1s,

ν

)

2+ 12 i=2 n f=1

ω

i

ν

fd

(

I_{i f}r,Is_{i f}

)

2 (6) with Ir

1 and I1s the facies proportion histogram for the two images,

Ir

i f and Ii fs all the other indicator values depending on the indicator and the facies and n the number of facies.

ω

represents the set of weights

ω

ithat control the impact of each indicator.

ν

represents the set of weights

νf

that control the impact of each facies. Note that the facies proportion histograms are the only indicators with one result for all the facies. Thus the facies proportions are treated differently from all the other indicators. The Jensen–Shannon distance used in that case is slightly modiﬁed:

dJS

(

Hir,H s i,

ν

)

=

1 ₂ n f=1

ν

f

⎡

⎣

Hr i flog

⎛

⎝

Hi fr 1 2

(

H r i f+ H s i f

)

⎞

⎠

+ Hs i flog

⎛

⎝

Hi fs 1 2

(

H r i f+ H s i f

)

⎞

⎠

⎤

⎦

(7) The dissimilarity values computed by formula 6between all the images constitute a non-negative symmetric matrix. This matrix has a zero diagonal corresponding to the dissimilarity between an image and itself. The dissimilarity matrix can be directly visualized with a heat map or treated by multidimensional scaling to get a more practical visualization.

3.4.Heatmap

The heat map is a simple graphical representation of a matrix where the matrix values correspond to colors. In our case, the heat map is a two-dimensional representation. This colored representation highlights patterns in the dissimilarity matrix, either between realizations or between simulation methods. The main advantage of the heat map is to show the real dissimilarity values, contrary to the multidimensional scaling described in the next subsection.

The heat map also enables to classify the images and/or to apply clustering methods on it. A simple yet informative classiﬁcation is the ranking according to the dissimilarities of the images toward one particular image. When using more advanced clustering methods, the matrix rows and columns are permuted to gather close values into the same cluster.

3.5.Multidimensionalscaling

Multidimensional scaling (MDS) [e.g., Torgerson, 1952; 1958, see CoxandCox,1994 for a review] is a set of data visualization methods to explore dissimilarities between objects – represented by a dissimilarity matrix – through a dimensionality reduction: it aims at producing a conﬁguration of the objects as optimal as possible in a lower dimensional representation.

3.5.1. Principleandmethodused

Finding the conﬁguration of the images in a k dimensional representation consists in locating a set of points representing the objects in a k-dimensional Euclidean space – with k being at most equal to the number of images minus one. The point positioning is done so that the Euclidean distance d between two points matches

(6)

as closely as possible the dissimilarities between the images: dr,s =

k i=1

(

xri− x si

)

2 (8)

with r and s two images, k the dimension number of the Euclidean space, x_ri and x_si the coordinates of respectively r and s in the ith dimension. The number of dimension k for the MDS representation is an input parameter. When equal to the number of images minus one, the distances are normally equal to the dissimilarities. When

k is lower, the MDS misrepresents more or less the dissimilarities. Several multidimensional scaling methods have been proposed (e.g., CoxandCox,1994), depending on the type of dissimilarities and on the way to match the dissimilarities with the distances. The classical scaling ( Gower, 1966; Torgerson, 1952; 1958) is the usual method for multidimensional scaling (e.g., ScheidtandCaers, 2009; Tan etal., 2014). It assumes that the dissimilarities already are Euclidean distances. If this assumption can be relaxed to a metric assumption, i.e., the dissimilarities are distances, Euclidean or not, the classical scaling may further misrepresents dissimilarities based on a heterogeneous metric.

Here we use a different method: the Scaling by MAjorizing a COmplicated Function (SMACOF) ( De Leeuw,1977;DeLeeuw and Heiser,1977;1980). Its goal is to get distances as close as possible from the dissimilarities using a majorization, i.e., the optimization of a given objective function called stress, through an iterative process. The stress derives from the squared difference between the dissimilarities and the distances. It is positively deﬁned and equals to 0 only when the distances are equal to the dissimilarities. The optimization process corresponds to a minimization of the stress. The ﬁnal stress value helps to assess the choice of the number of dimensions: the lower the stress is, the better is that choice.

3.5.2. Validationofthenumberofdimensions

Following the chosen number of dimensions for the representation, the point conﬁguration matches more or less the dissimilarity values. Verifying that the dimension number is enough for a good match between the dissimilarities and the distances is so of prime importance. Two approaches allow testing the chosen dimension number:

The screeplot: It represents the stress of the SMACOF against

the dimension number. The stress follows a globally convex decreasing function that tends toward 0 when the dimension number increases. A stress close or equal to zero means that the higher dimensions are unnecessary to represent the dissimilarities. The best number of dimensions is between the point with the highest ﬂexion of the curve and the beginning of the sill at zero. The dimension value right after the point with the highest ﬂexion is generally enough for a decent representation.

The Shepard diagram: It represents the distances against the

dissimilarities. The better the correlation, the better the choice of dimension number.

Two-dimensions are more practical for an analysis purpose. A three-dimensional representation remains a possibility if the im- provement is signiﬁcant enough from a two-dimensional representation.

3.5.3. Estimationofthepointpositionconﬁdence

The point position conﬁdence is another way to assess the MDS ability to represent the dissimilarities. For each point r, an error

e highlights the mismatch between the dissimilarities

δ

and the distances d with all the other points s:

er =

s

|

(

a

δ

r,s + b

)

− d r,s

|

(9)

with a and b the linear regression coeﬃcients found on the Shep- ard diagram. This measure gives a more local representation of the miss-representation than the scree plot or the Shepard diagram.

For visualization purpose, that error is then normalized, giving the conﬁdence c for a given image r:

cr = 1 −

er − e min emax− e min

(10) with emax and emin respectively the greatest and the lowest error values amongst the errors of all the images. This conﬁdence can then be attributed to its corresponding point in the MDS representation through the point transparency: the less transparent the point is, the best the dissimilarities related to this point with all the other points are represented.

4. Exampleofmethodapplication

The method, as described in the previous sections, consists in three steps:

1. Indicator computation.

2. Dissimilarity computation in a matrix.

3. Dissimilarity visualization and analysis, especially with multidimensional scaling.

The ﬁrst two steps were implemented in a C++ plugin for the SKUA-GOCAD geomodeling software ( Paradigm, 2015). The last step was realized using the software environment for statistical computing R ( R Core Team, 2012) with the addition of the R packages SMACOF ( DeLeeuw andMair, 2009) and ggplot2 ( Wickham,2009).

4.1. Dataset

The dataset falls within the simulation of a channelized system. It contains several realizations representing the same sedimentary environment simulated with different methods. The analysis aims at highlighting the indicator ability to capture the differences of static connectivity between the realizations, and especially between the realizations from different methods. As it concerns a sole case, it would be inappropriate to draw general conclusions on the simulation methods themselves.

The channelized system is composed of sandy channels with levees into a mudstone environment. A conceptual model, called the training image (TI) ( Fig.4, image at the top), provides an ideal representation of this system. The case study falls within the scope of a MPS study: several simulation methods are used to reproduce the sedimentary bodies observed in the training image. MPS performs better when the training image is larger than the realizations, to ensure enough pattern repeatability. It involves two grids: the ﬁrst one for the training image ( Fig.4, image at the top) and the second one for the realizations ( Fig.4, images at the bottom).

The training grid contains two sets of images:

TI: One object-based realization simulated using the object- based method of the software Petrel ( Schlumberger, 2015) (see AppendixC,TableC.6, for the simulation parameters).

Analog: 100 object-based realizations simulated with the same

method and parameters used to simulate the TI ( AppendixC, TableC.6).

The simulation grid contains four sets of images:

DeeSse: 100 MPS realizations simulated with the DeeSse imple-

mentation ( Straubhaar,2011) of the direct sampling method ( Mariethoz et al., 2010). Contrary to more traditional MPS methods, the direct sampling bypasses the conditional probability computation and resamples randomly the training

(7)

Fig. 4. Training image and examples of realizations for each category.

image. It relies on the compatibility measured with a distance between the conditioning data and the patterns scannedinthetrainingimage.The resamplingstepselects theﬁrstpatternwithadistancelowerthanagiven thresh-old.The trainingimageistheTIandthesetofparameters isgiveninTableC.4) intheappendix.

IMPALA: 100MPS realizationssimulatedwiththemethod

IM-PALA(Straubhaaretal., 2011,2013).Contrary totheDeeSse, IMPALA still computes the conditional probabilities during the simulation. To improve the eﬃciency of this computa-tion, the method stores the training image patterns in a list. The training image is scanned once at the beginning and the list is used instead during the simulation. The trainingimageis the TI and the set of parametersis given inTableC.5) intheappendix.

OBS: 100 object-based realizations simulated with the same methodandparametersusedtosimulatetheTI(Appendix C,TableC.6).

SIS: 100 sequential indicatorsimulation realizations simulated usingvariogramsbasedonthefaciesintheTI(AppendixC, TableC.7).

4.2. Analysissetting

ThepurposehereistocomparetherealizationswiththeTI.It

should lead toretain the methodandassociated parameters that

Table 2

Set of indicators used for the case study. The indicator deﬁnitions are summarized in Table 1and more detailed descriptions are in Appendix A.

Category Indicator Symbol Weight

Global indicators Facies proportion p 1

Facies adjacency proportion pa ₁

Facies connection probability 1

Connected component density 1

Unit connected component proportion

pu ₁

Traversing connected component proportion

pc ₁

Sh ape indicators Number of connected component cells

n 1

Box ratio β 1

Faces/cells ratio ζ 1

Sphericity φ 1

Skel eton indicators Node degree proportions pn ₁

Inverse branch tortuosity t 1

reproduce at best the static connectivity of the TI for the stud-iedcase. The indicators usedin thiscasestudy (Table 2) rely on theface-connectedcomponents,becausetheface-connectivity be-tweencellsisthemostfrequentlyused(RenardandAllard,2013). All the indicators are equally considered (

ω

i = 1 for all i in

formula 6). This avoids any subjective bias that could arise from favoringa

(8)

Fig. 5. View of all the channel connected components within the TI and examples of realizations for each categories. The number in parentheses are the number of connected components of each image.

givenindicator.Themudstoneenvironmentistheresultant ofthe

channelsandleveesplacement.Ithassonopreciseshapebyitself

andmayblurtheanalysis.Itgetsaweightof0whilechannelsand

leveesbothget eacha weightof1 (

ν

mudstone=0,

ν

channel=1 and

ν

lev ee=1). Channels andlevees are considered equally important

toreproduce,butthisaspectisrelatedtothecasestudyandcould

befurther discussed.The indicatorsare normalized tocancelthe

differencesofdifferentindicatorranges.Slicesof17cellsalongthe

gridaxiswiththesameorientationthanthechannelsareusedfor

theskeletonization.

Severalsamples arealsorandomly extractedfromthe gridsto

evaluatethe suitability of correcting the indicatorswhen dealing

withdifferent grid sizes. The traininggrid having 500 × 500 ×

20cellsandthe simulationgrid 100 × 150 × 30 cells,the

com-monlargestdimensionsforthesamplesare100× 150× 20.The

traininggrid is almost 10 times larger than the simulation grid.

Therefore,20samplesare extractedfromtheTI andeachanalog,

whereas2 samplesareextractedfromeach DeeSse,IMPALA,OBS

andSISrealization.

4.3.Visualinspectionoftherealizations

Looking attheconnectedcomponents (Fig.5) highlights some

expectationsforthedissimilarityanalysis.Twoaspectsmustbe

an-alyzed:thereproductionofthesedimentarybody shapesandthe

reproductionoftheirconnectivity,especiallyconcerningthe

chan-nels. In thestudied case, the reproductionofthe shape ispretty

easy to analyze visually. The SIS realizations do not display any

objects similar to channel/levee systems and are so far

dissimi-larfromtheTI.The OBSrealizationslooksimilartotheTI,which

is what is expected considering that they come from the same

methodandparameters.DeeSserealizationshaveobjectssimilarto

channels,evenifsomecontinuityissuesappear.Theyalsoseemto

havean insuﬃcientnumberofchannels.IMPALArealizationshave

quitelinearobjectsbutwhichpoorlyreproducechannelandlevee

shapes.

Estimatingthe staticconnectivityin three-dimensionalimages

ismorediﬃcult.TheTIchannelsseemhighlyconnected.The ob-jectsintheSIS realizationsdonotlocallyintersect likechannels doandare fartooconnected. DeeSserealizations containless ob-jects and seem under-connectedcompared to the TI. The distinc-tionbetweenOBSandIMPALArealizationsisdiﬃcultconcerning theconnectivity.Lookingattheskeletonsoftheconnected compo-nents(Fig.6) corroboratesthoseobservations. DeeSserealizations are clearly under-connected compared with the other categories. SISonesareover-connected.IMPALA realizationsseemabitmore connectedthanOBSones.Thestaticconnectivitywithinthe train-ingimageisclearlyheterogeneous.

(9)

Fig. 6. View of all the skeletons of the connected components for the TI and for a realization of each category.

4.4. Effectofdifferentgriddimensions ontheanalysis

The TI, analogs and OBS realizations come from the same methodwiththesameparameter values.Thegrid sizeistheonly difference between all these images: the grid of the TI and analogs – the training grid – is larger than that of the OBS realizations–thesimulationgrid.

This difference of grid dimensions directly impacts the con-nected componentdensityandthe numberofconnected compo-nentcells,whicharecorrectedtotakeintoaccountsuchdifference (AppendixB). Buttherealizationscomingfromthe samemethod still differwhen lookingatthe dissimilarities(Fig. 7, MDS

repre-sentationfrom the original images). The OBS realizations within thesimulationgrid standout fromtheTIandtheanalogswithin the training grid. Such difference is absent from the samples, where all the images have the same size (Fig.7, MDS represen-tation from the image samples). The grid size seems to clearly im-pactthedissimilarityvalues.

However, both MDS representations (Fig. 7) have high stress values with two dimensions and can not be fully trusted. Theheat maps(Fig.8) clarifythatsituation.

Theheat mapfromtheoriginalimages( Fig.8, bottom left) ap- pearnon-homogeneous. Aredsquaresymbolizes the signiﬁcant dissimilaritiesbetweentheTIandanalogsononeside andtheOBSrealizationsontheotherside.Theheat mapfrom thesamplesiffar

morehomogeneous,withoutredsquare.Theyconﬁrm theimpact ofthegrid sizeobservableontheMDSrepresentations.

Thus, correcting the connected component density and the numberof connected componentcells is not adequate,and other indicators are impacted by the grid dimensions. The TI, the analogs

andthe OBS realizations have all similar channel andlevee pro-portions (Fig. 9). The channels and levees occupy the same vol-ume insidethe two grids. But the faciesconnection probabilities forbothchannelsandleveesdifferbetweentherealizationsinthe twogrids(Fig.9).Theprobabilitythattwocellsofthesamefacies belong to the same connected component is higher in the train-ing grid than in the simulation grid. This is consistent with the differenceofgrid dimensions. Whenthegrid dimensionalong the channeldirectionincreases,theprobabilitythattwochannelscross each other to form a single connected component increases too, especially here withsinuous channels.In such case, the grid size impactsthe characteristics of the connected components and the associatedindicatorvalues.

Comparingsamplesappear tobe essential withgrids of differ-entdimensions.Andusingsamplesrevealsotheraspectsofthe im-ages.Forinstances,thedifferentsamplescomingfromtheTIare highlydissimilar.Thisillustratesthenon-stationarityoftheTI con-cerning theconnectivity: some areas contain only one connected componentasthechannelsareallconnected,whereasotherareas containmoreconnectedbodies.

(10)

Fig. 7. MDS representations of the dissimilarity matrices for the original images (with corrections of the indicators to cancel the effect of the grid dimensions) and samples (of same size). The scree plot for the original images only displays the stress values up to 10 dimensions on 200 possible. The scree plot for the samples only displays the stress values up to 10 dimensions on 2220 possible.

4.5.Comparingtheconnectivityofthetrainingimageandofthe realizations

The purposeis now to compare the trainingimage to all the

realizations.These realizationscome fromdifferent methods, but

allborrowtheirinput fromthetrainingimage andhaveto

repro-ducethesedimentarybodiesofthetrainingimage.Allthe

follow-ing analysis relieson theimage samplesandnot on the original images to avoid any bias due to the difference of size between thetrainingimageandtherealizations.

4.5.1. Analysisofthedissimilarities

Thedissimilaritiesgiveaﬁrstinsightontherelationships betweenthedifferentrealizations(Fig.10).The trainingimage

(11)

Fig. 8. Heat map representations of the dissimilarity matrices for the original images (with corrections of the indicators to cancel the effect of the grid dimensions) and samples (of same size). Only one triangle of the symmetric matrices is represented.

samples fall within the OBS samples, highlighting the similarity

of these images. The samples from the multiple-point methods,

DeeSse and IMPALA, are close to the OBS samples, but they do

not mix up much. All these images are so not completely

sim-ilar. Furthermore, the DeeSse and IMPALA samples remain away

from the TI samples. The SIS samples are clearly distinct from

all the other samples, and are the most distant from the TI

samples.

Iftheconﬁdenceofthetwo-dimensionalMDSrepresentationis

not high,theheatmap conﬁrmsthoseobservations (Fig.11).

The ﬁrst rowshows the dissimilaritiesbetween the training

im-agesamplesandtherealizationsamples.Thewhitestsamples,the

OBS ones, are the closest tothe TI. The reddest samples,the SIS

ones,arethefurthestfromtheTIsamples.TheDeeSseandIMPALA

samplesfallinbetween,andseemequallyclosetotheTIsamples.

Globally, thedifferences betweenall the methods are signiﬁcant,

ashighlightedontheMDS.

As observed in the previous section, the training image

sam-ples are dissimilar one from the other.It shows the

heterogene-ityofstaticconnectivitywithinthetrainingimage.Concerningthe

realizations,the OBSrealizationsare alsodissimilar onefromthe

other,whereastheSISrealizationsareallreallyclose.BothDeeSse

and IMPALArealizations are more spaced than the SIS ones, but

not asmuchastheOBSrealizations.Allthistendstowarda

vari-able diversity concerning the static connectivity forthe different

methods inthatcasestudy.Goingbacktotheindicatorshelpsto

furtheranalyzesuchbehavior.

4.5.2. Analysisoftheindicators

Theindicatorvaluesforthechannels(Figs.12and14) and lev-ees(Figs.13 and15) differdependingonthecategory.The

differ-ences are more or lessclear depending on the indicator, whose

behaviordiffersbetweenthetwosedimentarybodytypes.

The OBSsamples being similar tothe training image samples

appearalsoon theindicator values.Thesevaluesare close– and

formanyindicatorstheclosest– to theTIvaluesforthechannels.

Thattrendislessobviouswiththelevees,withlessclosevalues.

Buttheleveedensityistheonlyindicatorto bereallyawayfrom

theTIvalues.Allthisconﬁrmsthecloserelationshipbetweenthe

trainingimageandtheOBSrealizationsconcerningthestatic

con-nectivity.Italsoconﬁrmsthevisualobservations.Thisisconsistent

withtheuseofthesamemethodandparameterstosimulatethe

trainingimageandtheOBSrealizations.

Similarly,thesigniﬁcant dissimilaritybetweentheSIS samples

andthe TI samples also appears on the indicator values. This is

obvious on the traversing component proportion or the

compo-nentdensity.Thehighcomponentdensitymeansahighernumber

ofconnectedcomponentscomparedtotheother samples.Onthe

other side, the average number ofcomponent cells is quite low,

meaningthatmostofthesenumerouscomponentsaresmall.The

low traversingcomponentproportionsigniﬁes that mostofthese

componentsarenotcontinuousenough torepresentchannelsnor

levees.Concerningchannels,thesigniﬁcantdifferencebetweenthe

SISand TIsamples fortheshape indicators– numberof

compo-nentcells,boxratio,faces/cellsratioandsphericity– impliesthat

theSIScomponentsdonotlooklikechannels.Thisdifferentshape

(12)

Fig. 9. Box-plots comparing the facies proportions and facies connection probability for the TI, some TI analogs and the OBS realizations.

also appears on the node degrees, with far higher node degree values than for the other categories, implying a less linear shape. Despite numerous and small components, the channel connection probability remains high. This means that these samples must contain one large component. This component must be traversing, as the traversing component proportion is not equal to zero. All those observations are consistent with the visual inspection of the realization, and confirm a significant difference of static connectivity between the training image and the SIS realizations. Many indicators also display a narrow range of values. This confirms the low variability between the SIS samples concerning the connectivity, as seen with the dissimilarities.

DeeSse and IMPALA samples have similarities with the SIS samples, especially more, and smaller, connected components than in the training image, as visible on the component density and the number of component cells. Similarly, the shape indicators show a signiﬁcant difference between the TI samples and both the DeeSse and IMPALA samples. The higher sphericity implies in particular less linear shapes for the channels. Despite being equally dissimilar to the TI samples, the other indicators show signiﬁcant differences between the DeeSse and IMPALA samples and the TI samples. The DeeSse samples have far lower channel and levee proportions. This impacts the facies adjacency, with channels and levees being more adjacent to the mudstone. But the most relevant difference between the DeeSse and IMPALA samples comes from the channel connection probability: the connection probability of the DeeSse samples is lower than that of the TI samples, whereas the connection probability of the IMPALA samples is higher than that of the TI samples. The IMPALA samples have a behavior similar to that of the SIS, with a few large component among smaller ones. But these

component connectivity is not completely similar to that of the SIS. This is especially visible on the node degree proportions, with the IMPALA samples having an intermediary behavior between the TI and the SIS samples. On the other side, the DeeSse sample connectivity seems lower. The higher degree two proportion of the DeeSse samples implies few intersections between channels. The higher degree one proportion also implies more discontinuous components. Again, all of this is consistent with the visual observations: DeeSse channels are clearly identiﬁable but discontinuous, whereas IMPALA channels are less visible, with many intersections.

In this case, the indicators conﬁrm what comes from the dissimilarities: the OBS realizations are the most similar to the training image from a static connectivity perspective. This is consistent with the visual observations, and with the use of the same method to simulate the TI and the OBS realizations. The next section en- deavors to compare those results from what can be obtained with multiple-point histograms.

4.6. Comparisonwithmultiple-pointhistograms

Multiple-point histograms or pattern histograms have made their way as indicators of a realization quality with MPS methods (e.g., Boisvert et al., 2010;Tan etal., 2014). We propose here to compare the results obtained with those histograms to the previous results. The histograms are based on a 3 × 3 × 3 pattern and are computed on three levels of multi-grids ( Tran, 1994), giving three histograms per image. The dissimilarity

δ

between two images r and s is adapted from the work of Tanetal.(2014):

δ

(

Hr,Hs

)

= 3 l=1 1 2 lDJS

(

H r l,H s l

)

(11)

with Hr_and_Hs_{the sets of three histograms for each image,}_l_the multi-grid level and DJS the Jensen–Shannon divergence, which is the squared Jensen–Shannon distance. A multi-grid level l of 1 corresponds to the ﬁnest level and here 3 is the coarsest level. The coarser levels characterize the large-scale behavior of the sedimentary bodies. But they induce a loss of information. This justiﬁes the decreasing weights when the multi-grid level increases. Similarly to the work using multiple-point histograms, the comparison is directly made on the original images, not on samples.

The observations about the category relationships made with the previous indicators ( Fig. 10) remain valid on the MDS representation from the multiple-point histogram ( Fig.16). The training image falls within the OBS realizations. The DeeSse and IMPALA realizations are close from the OBS ones, but with a clear separation. They all remain separated from the TI. Again, the SIS realizations are far away from all the other images, including the TI. The main difference with the previous indicators comes from the variability within a category. This is especially noticeable with the SIS realizations, which seem to have a signiﬁcant pattern variability.

The two-dimensional MDS representation is here again a poor representation of the dissimilarities, with a high stress. Only the dissimilarities with the training image are kept to directly study them and compare the ranking between different indicators ( Fig.17). Looking at all the connected component indicators – i.e., all the indicators described in Table1– points out the conclusions coming from Fig.10: the OBS realizations are the closest to the TI, the SIS ones the furthest, and the DeeSse and IMPALA realizations stand in between. Similar rankings come from the shape indicators – i.e., number of component cells, box ratio, faces/cells ratio and sphericity – and the skeleton indicators – i.e., node degree proportions and inverse branch tortuosity.

The multiple-point histograms have also a similar ranking, with a clearer separation between the SIS realizations and the other re-

(13)

Fig. 10. MDS representation of the dissimilarities between the samples of the case study generated using SMACOF and validation graphs. The scree plot only displays the stress values up to 10 dimensions on 820 possible.

Fig. 11. Heat map representation of the dissimilarity matrix computed based on the samples of the case study.

alizations ( Fig.17, All the multi-grid levels). However, the dissimilarities between the training image and the DeeSse realizations vary signiﬁcantly between the multi-grid levels. The largest multi- grid level even places the DeeSse realizations closer to the TI than the OBS realizations. This level characterizes the large-scale behavior of the sedimentary bodies. Such ranking is then particularly surprising due ot the presence of discontinuous bodies within the

DeeSse realizations, but neither within the OBS ones nor within the TI. These continuity differences are conﬁrmed by the skeletons, especially the higher proportion of node of degree one inside the grid for the DeeSse than for the OBS realizations.

5. Discussion

The previous section highlights the ability of the method to distinguish realizations by focusing on the static connectivity through the connected components. This section discusses some aspects of the analysis process.

5.1.Abouttheindicators

All the indicators proposed here rely more or less directly on the connected components. Some of them are classical, such as the facies proportion, but as highlighted on Fig. 9the facies proportion is not enough to characterize the static connectivity. New indicators are introduced here compared to previous studies on connected components ( DeIacoandMaggio,2011;Deutsch,1998). Some indicators lead to better characterize the component organization, such as the traversing component proportion or the component density. Other indicators aim to better characterize the component shape, such as the sphericity. Using skeletons is also a new feature to compare realizations. The node degree proportion appears to give many details about the connectivity. The branch tortuosity has been less useful for the studied case, with a poorer dis- crimination of the realizations. This is due to the parameterization of the skeletonization, which favor the topology at the cost of the geometry of the skeletons.

(14)

Fig. 12. Box-plots comparing the range of indicators computed on the channels for the different categories, except the node degree proportions.

The use of multiple-point histograms as indicators in a method similar to Tan etal.(2014)shows a ranking close to that with the connected component indicators. However, they do not characterize the realizations in the same way. The multiple-point histograms of the ﬁnest multi-grid or multi-resolution level characterize in details the shape of the sedimentary bodies. The shape indicators are global measures over a whole connected component. As connected components can have variable shapes due to the sedimentary bodies intersections, being able to characterize more ﬁnely the component shape is an interesting asset. From this point of view, the

multiple-point histograms could bring further information on the connected component shape.

However, the multiple-point histograms do not measure the static connectivity: they compare the patterns between the images, but not really the relationships between the patterns. The study of the coarsest multi-grid or multi-resolution levels attempts to look at the large scale behavior of the sedimentary bodies. But many details are lost in the process, what justiﬁes the lower weights for these levels in the dissimilarity from multiple-point histograms ( Tanetal.,2014). And it still not characterizes the static connectiv-

(15)

Fig. 13. Box-plots comparing the range of indicators computed on the levees for the different categories, except the node degree proportions.

ity. From this point of view, the skeletons describe more precisely the large-scale behavior of the components and their connectivity.

5.2. Aboutindicatorcomparison

As stated in the previous section, a single indicator is not enough to fully characterize the static connectivity. Comparing several indicators lead to more relevant information about the realizations and how much they differ from the viewpoint of connectivity. Comparing realizations on grid of different dimensions leads to is- sues non-addressed by previous studies ( DeIacoandMaggio,2011;

Deutsch, 1998). A correction on the two most affected indicators is not suﬃcient to compensate for different grid dimensions. Sam- pling the images appears to be more eﬃcient, and also helps to analyze the connectivity heterogeneity within the images. The question of the sampling representativeness remains to be explored.

Using a metric is very useful, because it gathers all the indicator values into one dissimilarity value and facilitates the comparison of the realizations and the analysis. Tanetal.(2014)already used such process with multiple-point histograms. We have applied a similar principle to connected components, gathering many indi-

(16)

Fig. 14. Mean node degree proportions of the channel skeletons for each category. The error bars display the minimum and maximum proportions. The ﬁrst node degree 1 corresponds to the nodes of degree one along a grid border. The second node degree 1 corresponds to the nodes of degree one inside the grid.

cators into values easier to analyze. The introduction of a heterogeneous metric gives the opportunity to gather indicators of different types and further improves the method ability to characterize the realization static connectivity. At the end, the dissimilarities distinguish the realizations from different methods and parameter values, but also characterize the static connectivity variability between the realizations of a given method and parameter values.

Adding weights to the indicators in the metric computation means more flexibility for the user. Indeed, not all the indicators are significant to all the applications. For instance with a flow simulation purpose, the unit component proportion is not necessarily significant due to a fewer impact of the unit-volume component on the flow than channels. But such weights remain optional. In the case study, we did not discriminate the indicators with weights, because we wanted to study the information provided by all the indicators on the realizations. Studying the indicator values after the dissimilarities remains essential to better understand the static connectivity of the realizations.

5.3. Abouttheskeletonizationmethod

Skeletons enable to better characterize both the geometry and the topology of connected components. However, the skeletonization method inﬂuences both the geometry and the topology of the resulting skeletons. Among all the skeletonization methods, Corneaetal.(2007)distinguish the thinning-based method as the method with the best control on the skeleton connectivity. This section aims at comparing the result of a thinning-based method

Fig. 15. Mean node degree proportions of the levee skeletons for each category. The error bars display the minimum and maximum proportions. The ﬁrst node degree 1 corresponds to the nodes of degree one along a grid border. The second node degree 1 corresponds to the nodes of degree one inside the grid.

with the method introduced in Section 2.3 based on slicing the grid and computing the connected components, denoted as the slicing-based method. The slicing-based method used hereafter is the algorithm deﬁned by Lee et al. (1994) and implemented in the geomodeling software Gocad by Barthélemy and Collon-Drouaillet(2013).

The thinning-based method appears to perform better in two dimensions than the slicing-based method. But in three dimensions it tends to generate many small-scale loops ( Fig.18) which perturb both the topology and the geometry of the skeletons. The primary goal of the skeletons is to better characterize the large- scale topology – and possibly the geometry – of the connected components. The skeletons from the thinning-based method seem too perturbed to help in that characterization. The slicing-based method on the other side does not necessarily capture those small- scale elements due to the slice size. A large slice size may not capture the small components or all the component irregularities, but this is compensated in some way by the other indicators, in particular the shape indicators. Moreover, the thinning-based method tends to generate skeletons with many nodes, which are heavy to manipulate. The slicing-based method does not have the same is- sue when using quite high slice thicknesses. This aspect can be essential when dealing with several hundreds of images.

All this leads to favor the slicing-based method in this work. Some aspects still need to be explored, such as the impact of the slice size. But many more skeletonization methods exist, even if skeletonizing three-dimensional shapes is an open debate. Further

(17)

Fig. 16. MDS representation of the dissimilarities between the images of the case study generated using SMACOF and validation graphs. The dissimilarities are based on the multiple-point (MP) histograms of the images. The scree plot only displays the stress values up to 10 dimensions on 400 possible.

work could be done to study other methods and the topology and geometry of resulting skeletons.

5.4. AboutMDSmethodsandaccuracy

We rely on the Scaling by MAjorizing a COmplicated Function as multidimensional scaling method to represent the dissimilarities. The SMACOF signiﬁcantly facilitates the dissimilarity analysis. However, the dimensionality reduction makes the MDS representations imprecise, and the distances between the points tend to differ from the dissimilarities.

Thus, the MDS is not a simple visualization tool and can impact the analysis. This can be illustrated by comparing the MDS representation from the classical scaling ( Fig. 19) and that from the SMACOF ( Fig. 20) to analyze the dissimilarities between the original realizations and the original training image (and not the samples). Normally, the TI should stand from the realizations (see Fig.7). But the classical scaling puts the TI close from the OBS and IMPALA realizations. Only the point position conﬁdence shows that the TI position is wrong on the representation. The SMACOF representation separates more clearly the TI from the other images.

Moreover, if the global relationships between the realization categories are similar between the two representations, the rela- tive position of the images can be signiﬁcantly different. This is clear with the TI, but also with other images ( Table3). This appears more largely on the Shepard diagram, with a better coeﬃcient of determination r2_{for the SMACOF than for the classical scaling. The} classical scaling tends here to decrease the dissimilarities. As a result, the realization ranking can differ between the dissimilarities

Table 3

Comparison of dissimilarities and distances between the TI and DeeSse realization 12 and 76 for to MDS methods.

Compared images Dissimilarity Classical scaling distance SMACOF distance TI - DeeSse 12 1 .579 0 .596 1 .414 TI - DeeSse 76 1 .358 0 .854 1 .265 DeeSse 12 - DeeSse 76 0 .905 0 .259 0 .203

and the MDS distance ( Table3). Thus, analyzing the sole MDS representation can lead to erroneous interpretations.

The choice of the MDS method is signiﬁcant, so as the choice of the number of dimensions. We have privileged two- dimensional MDS representations for the sake of visibility, but three-dimensional representations would be worth testing. In any case, the MDS representation should always be cautiously studied and its misrepresentation of the dissimilarities should be kept in mind. From this point of view, the heat map facilitates the analysis of the real dissimilarity values. Analyzing a single row or column of the dissimilarity matrix – so comparing an image with all the others – is as easy to analyze as a MDS representation, but only on a subset of the images.

As the MDS facilitates the dissimilarity analysis, the dissimilarity simply makes the indicator analysis easier. After looking the MDS representation, it is essential to go back to the dissimilarity values to validate the observations. Similarly, studying the indicator values validates the observations and helps to further understand the difference of connectivity between the images.

(18)

Fig. 17. Box-plots comparing the realizations of each method with the TI. The dissimilarities depend on different indicators in each box-plot. Only the multiple-point histograms compare the original image directly, with one dissimilarity value with the TI per image. The other categories are based on image samples, and have several dissimilarity values with the TI per image. These values are averaged to obtain a single dissimilarity value with the TI.

5.5. Impactoftheconnectedcomponentsontheﬂow

Facies heterogeneity shapes the fluid flow in subsurface. Thus, the petrophysical property simulation, directly correlated to the facies modeling, constitutes a preliminary step to further simulate flows. Being able to assess the static connectivity at the beginning of the workflow could constitute a real advantage in term of resources and time. It is also a way to ensure a better geological consistency, which in itself allows a better integration of field observations and measurements (seismic information, well data, etc.).

It would be interesting to apply the method on more detailed facies models than those of the case study. For instance, both channel and levee deposits are often heterogeneous, regardless of the sedimentary environment, with porous deposits more or less nested between flow barriers (e.g., Hubbard et al., 2011; Hansen et al., 2015). Such flow barriers take the shape of mudstone drapes along the channel margins, of margin failure deposits, of channel abandonment deposits, etc. They can have a significant impact of the fluid flow (e.g., Labourdette et al., 2006; Pranter et al., 2007; Alpak et al., 2013; Issautier et al., 2013) and on the aquifer compartmentalization, with sometimes important consequences when they are ignored (see for instance Gainski et al., 2010, in a oil exploitation context). Our method could clearly help to distinguish between several images from their differences in static connectivity such images including realizations from different methods and/or parameter values or referring models. From this perspective, the case study shows the method

ability to identify the simulation methods that produce subsurface models consistent with the static connectivity of a referring model.

Such approach is particularly adapted for fluvial and turbiditic channelized environments where channels tend to form high connectivity corridors, leading to channelized flow path. However, the static connectivity of a sedimentary body is not always repre- sentative of the flow behavior. For example, flow channeling can also emerge from non-channelized but highly heterogeneous bodies (e.g., Park etal., 2008; Fiori andJankovic,2012). It highlights the dependence of the hydrodynamic connectivity on many parameters: the permeability contrasts between the different media, the internal heterogeneity of each media, etc. Flow simulations then require to assign the petrophysical properties to each facies – usu- ally with geostatistical methods (e.g., Deutschand Journel, 1992). Our metrics obviously do not anticipate the results of such proce- dures and, thus, just measure the consistency of the facies simulations in term of static connectivity.

Depending on the studied environment, the reproduction of the static connectivity could be secondary and one could directly work on hydraulic connectivity through the corresponding properties. If reproducing the static connectivity does not guarantee to reproduce the exact hydraulic connectivity, it remains a step toward a better integration of geological information and knowledge in the physical description of the media. Our method provides a simple and objective basis for the comparison of large sets of realizations from this static connectivity point of view.

Comparing connected structures in ensemble of random fields

Comparing

connected

structures

in

ensemble

of

random

ﬁelds

Guillaume

Rongier

,

Pauline

Collon

,

Philippe

Renard

,

Julien

Straubhaar

,

Judith

Sausse

a

b

s

t

r

a

c

t

(

)

(

)

μ

σ

μif

σif

(

)

(

)

(

)

(

)



⎡

⎣

⎛

⎝

(

)

⎞

⎠

⎛

⎝

(

)

⎞

⎠

⎤

⎦

(

)



(

(

)

(

))

δ

δ

(

ω

ν

)



ω

_,

_Pauline

_Collon

_,

_Philippe

_Renard

_,

_Julien

_Straubhaar

_,

_ω